8000 MCD Exact Fit: Fixes Issue #3367 by ThatGeoGuy · Pull Request #4635 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

MCD Exact Fit: Fixes Issue #3367 #4635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

MCD Exact Fit: Fixes Issue #3367 #4635

wants to merge 6 commits into from

Conversation

ThatGeoGuy
Copy link
Contributor

The following patch fixes issue #3367, to the best of my knowledge. However, while testing with various "exact fit" scenarios, I encountered an issue which I believe may be somewhere else in the code. The data I used to test this is a simple plane (all Z = 0) with a few outliers present. The exact fit scenario works and no issues arise.

However, taking the same data as above, converting it to the affine subspace spanned by the data, which is basically a rotation about the principal axes and a translation by the mean, the data occasionally raises an error where the determinant is larger than that of the previous determinant. Both datasets can be found at https://gist.github.com/ThatGeoGuy/713bd1355b87ea2d5d07, where good_data.txt is the original data, and bad_data.txt is the same data transformed into its affine subspace.

The final result is still correct in the end despite a couple of the trials triggering the above issue, and it only happens with the bad_data.txt dataset. I am unsure what is causing this, but it appears to be something different than what caused #3367.

@amueller
Copy link
Member

Thanks for the PR. I don't suppose @VirgileFritsch is around to review.
I'll try to find time to study the paper, but it might be a while, sorry.

@ThatGeoGuy
Copy link
Contributor Author

Hey all, just wondering if this commit ever had any problems raised. I'm in no rush as it has thus far worked without flaw for me (compares almost exactly to LIBRA in terms of results, minus some floating point differences), but I'm wondering if there's any additional material / resources this needs in order to move forward.

As far as I can see the project is quite busy so I don't mean to rush, but it has been a few months, so I wanted to check up to make sure.

@agramfort
Copy link
Member

any chance to add a test?

@ThatGeoGuy
Copy link
Contributor Author

I could add a test, but I am unsure how to add tests. I've never used continuous integration before, and am unsure how to commit a test to the repo. Is there a guide somewhere that can direct me?

@amueller
Copy link
Member

The test should be a small synthetic dataset (such as data_good.txt) for which there was an error before but not after your fix.
You should add a function that generates that data and tests whether the result is correct to ``sklearn/covariance/tests/test_robust_covariance.py. The function name should start with test_`.

Run the tests using nosetests -sv sklearn/covariance/tests/test_robust_covariance.py. It should fail on master and work on your branch.

@ThatGeoGuy
Copy link
Contributor Author

I had unfortunately ignored this as I was working through quite a bit of things during my Master's. I'm closing this as my patch has diverged from upstream quite significantly. I need to take a look at this again and tackle the issue fresh, so I'm closing this PR. However, I have a much better idea now of how to get the tests built and running, so I imagine my next attempt will be much better. I sincerely apologize for ignoring this for so long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0