8000 [MRG] GaussianMixture with BIC/AIC by tingshanL · Pull Request #26735 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG] GaussianMixture with BIC/AIC #26735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open

Conversation

tingshanL
Copy link

Reference Issues/PRs

Fixes #19338. Automates the selection in Gaussian Mixture Model Selection. Adds a basic GaussianMixtureIC estimator without initializing with agglomerative clustering as discussed in #19562.

What does this implement/fix? Explain your changes.

It automatically selects the best GM model based on BIC or AIC among a set of models that are parameterized by:

  • Covariance constraints

  • Number of components

Any other comments?

@github-actions
Copy link
github-actions bot commented Jun 30, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: e2f9a77. Link to the linter CI: here

@tingshanL tingshanL marked this pull request as ready for review July 25, 2023 21:18
@tingshanL tingshanL changed the title GaussianMixture with BIC/AIC [MRG] GaussianMixture with BIC/AIC Jul 12, 2024
@jovo
Copy link
jovo commented Jul 23, 2024

@amueller @NicolasHug What do you think?

@jovo
Copy link
jovo commented Jul 30, 2024

@jjerphan @ogrisel This PR is designed to greatly simplify user's life when trying to do something like in this tutorial: https://scikit-learn.org/stable/auto_examples/mixture/plot_gmm_selection.html, and is highly simplified relative to other recently proposed PRs, such as #19562. We are grateful for your feedback, and look forward to finalizing this.

@jovo
Copy link
jovo commented Aug 13, 2024

@adam2392 Hey Adam - Curious to hear what you think about this?

@tingshanL
Copy link
Author

@adam2392 Hey Adam - Curious to hear what you think about this?

Hello Adam @adam2392,

We have reviewed and addressed all feedback provided here. Below is a summary of the primary comments and our responses:

  • "significantly richer init capabilities than the underlying GaussianMixture class in scikit-learn": currently, GaussianMixtureIC selects models only based on covariance constraints and the number of components, the same as the GM model selection example in the library.
  • "orders of magnitude slower than GaussianMixture": using current GaussianMixtureIC to run the same task as in the example does not take more time. On my computer, both fitting procedures took around 1.3s.
  • "to configure continuous integration to run the tests, including a test to run the check_estimator function to make sure that the code stays compatible with future scikit-learn versions": all checks, including the tests related to check_estimator, have passed.
  • "joblib-based multi-threading": it is no longer in the code.
  • "regularization": also not included in the code anymore.

With these revisions, we believe the code is ready for the next review. Please let us know if there is anything further we should adjust. Thank you very much for your time and insights!

@adam2392
Copy link
Member

Hey @tingshanL okay thanks!

I think this will require a discussion among some of the more senior maintainers on the team. Personally, I do see the use of a mclust like algorithm within Python/sklearn ecosystem, since I have used it in the past in R. However, we'll have to see what others say… I know they're busy, and including new models is not super easy, so thank you for the patience.

Out of curiosity (perhaps you can just summarize in the PR description if we want to reserve the space for other discussion), how does this differ from mclust? If there are significant differences, what would need to be done to make this 1-1 matching? This is just some high level info I'm curious on to guide the discussion, so feel free to not spend too much time addressing this question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Gaussian Mixture with BIC/AIC
5 participants
0