-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Robust covaraince (fast_mcd) does not handle singular covariance matrices #3367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Sorry for the long silence. If you are still interested a fix would be very welcome. |
Hey, sorry for the long wait, I haven't taken the time to commit to this issue since I have been busy with final projects / exams. I submitted a PR which I believe fixes this issue (PR #4635), but does manage to produce some warnings with one of the sample datasets I used to test the fix. I believe the issue with that sample dataset (described in the PR) is different than what caused this issue. Overall, I think this issue is solved, as despite the warnings on some of the trials, this still produces the correct answer for exact fit scenarios. |
I'm starting fresh to try and tackle this again, two years later (sorry, it seems I really delayed my old fix, which is now incompatible to an extent with the old code). I'm gonna try and write some test cases and submit a new PR. Sorry for being such a wreck about actually fixing this, I think I was quite close to getting everything correct last time. For the record, I will also test whatever data I generate against the reference implementation in LIBRA as well, which should let me know if I change things too much. |
…ce matrix (#8328) * Adds failing test for issue 3367 * Adds extra comments to describe what I'm testing Basically in this instance I discovered this bug independently from \#3367 because I was trying to use principle components to estimate plane normals / geometry of points in 3D space. When you have a set of points that specify a perfect plane though (or in the case of wanting to use MCD, you have a subset of your points that specify a perfect plane), then the code fails because the covariance matrix is singular. However, if your covariance matrix is singular, you've already found the set of points with the lowest determinant. As per Rousseeuw & Van Driessen 1999, at this point you can stop searching. The code did stop searching, however, it raised a ValueError on singular matrices for no reason. So the correct fix should be to remove that. * Fixes issue 3367 This should work with the test case provided. * Adds missing argument to test * Style corrections to pass flake runner Implements the style corrections as mentioned in pull request #8328 * Adds entry for PR #8328 to what's new * Adds review changes from @jnothman
…lar covariance matrix (scikit-learn#8328) * Adds failing test for issue 3367 * Adds extra comments to describe what I'm testing Basically in this instance I discovered this bug independently from \scikit-learn#3367 because I was trying to use principle components to estimate plane normals / geometry of points in 3D space. When you have a set of points that specify a perfect plane though (or in the case of wanting to use MCD, you have a subset of your points that specify a perfect plane), then the code fails because the covariance matrix is singular. However, if your covariance matrix is singular, you've already found the set of points with the lowest determinant. As per Rousseeuw & Van Driessen 1999, at this point you can stop searching. The code did stop searching, however, it raised a ValueError on singular matrices for no reason. So the correct fix should be to remove that. * Fixes issue 3367 This should work with the test case provided. * Adds missing argument to test * Style corrections to pass flake runner Implements the style corrections as mentioned in pull request scikit-learn#8328 * Adds entry for PR scikit-learn#8328 to what's new * Adds review changes from @jnothman
…lar covariance matrix (scikit-learn#8328) * Adds failing test for issue 3367 * Adds extra comments to describe what I'm testing Basically in this instance I discovered this bug independently from \scikit-learn#3367 because I was trying to use principle components to estimate plane normals / geometry of points in 3D space. When you have a set of points that specify a perfect plane though (or in the case of wanting to use MCD, you have a subset of your points that specify a perfect plane), then the code fails because the covariance matrix is singular. However, if your covariance matrix is singular, you've already found the set of points with the lowest determinant. As per Rousseeuw & Van Driessen 1999, at this point you can stop searching. The code did stop searching, however, it raised a ValueError on singular matrices for no reason. So the correct fix should be to remove that. * Fixes issue 3367 This should work with the test case provided. * Adds missing argument to test * Style corrections to pass flake runner Implements the style corrections as mentioned in pull request scikit-learn#8328 * Adds entry for PR scikit-learn#8328 to what's new * Adds review changes from @jnothman
…lar covariance matrix (scikit-learn#8328) * Adds failing test for issue 3367 * Adds extra comments to describe what I'm testing Basically in this instance I discovered this bug independently from \scikit-learn#3367 because I was trying to use principle components to estimate plane normals / geometry of points in 3D space. When you have a set of points that specify a perfect plane though (or in the case of wanting to use MCD, you have a subset of your points that specify a perfect plane), then the code fails because the covariance matrix is singular. However, if your covariance matrix is singular, you've already found the set of points with the lowest determinant. As per Rousseeuw & Van Driessen 1999, at this point you can stop searching. The code did stop searching, however, it raised a ValueError on singular matrices for no reason. So the correct fix should be to remove that. * Fixes issue 3367 This should work with the test case provided. * Adds missing argument to test * Style corrections to pass flake runner Implements the style corrections as mentioned in pull request scikit-learn#8328 * Adds entry for PR scikit-learn#8328 to what's new * Adds review changes from @jnothman
…lar covariance matrix (scikit-learn#8328) * Adds failing test for issue 3367 * Adds extra comments to describe what I'm testing Basically in this instance I discovered this bug independently from \scikit-learn#3367 because I was trying to use principle components to estimate plane normals / geometry of points in 3D space. When you have a set of points that specify a perfect plane though (or in the case of wanting to use MCD, you have a subset of your points that specify a perfect plane), then the code fails because the covariance matrix is singular. However, if your covariance matrix is singular, you've already found the set of points with the lowest determinant. As per Rousseeuw & Van Driessen 1999, at this point you can stop searching. The code did stop searching, however, it raised a ValueError on singular matrices for no reason. So the correct fix should be to remove that. * Fixes issue 3367 This should work with the test case provided. * Adds missing argument to test * Style corrections to pass flake runner Implements the style corrections as mentioned in pull request scikit-learn#8328 * Adds entry for PR scikit-learn#8328 to what's new * Adds review changes from @jnothman
…lar covariance matrix (scikit-learn#8328) * Adds failing test for issue 3367 * Adds extra comments to describe what I'm testing Basically in this instance I discovered this bug independently from \scikit-learn#3367 because I was trying to use principle components to estimate plane normals / geometry of points in 3D space. When you have a set of points that specify a perfect plane though (or in the case of wanting to use MCD, you have a subset of your points that specify a perfect plane), then the code fails because the covariance matrix is singular. However, if your covariance matrix is singular, you've already found the set of points with the lowest determinant. As per Rousseeuw & Van Driessen 1999, at this point you can stop searching. The code did stop searching, however, it raised a ValueError on singular matrices for no reason. So the correct fix should be to remove that. * Fixes issue 3367 This should work with the test case provided. * Adds missing argument to test * Style corrections to pass flake runner Implements the style corrections as mentioned in pull request scikit-learn#8328 * Adds entry for PR scikit-learn#8328 to what's new * Adds review changes from @jnothman
…lar covariance matrix (scikit-learn#8328) * Adds failing test for issue 3367 * Adds extra comments to describe what I'm testing Basically in this instance I discovered this bug independently from \scikit-learn#3367 because I was trying to use principle components to estimate plane normals / geometry of points in 3D space. When you have a set of points that specify a perfect plane though (or in the case of wanting to use MCD, you have a subset of your points that specify a perfect plane), then the code fails because the covariance matrix is singular. However, if your covariance matrix is singular, you've already found the set of points with the lowest determinant. As per Rousseeuw & Van Driessen 1999, at this point you can stop searching. The code did stop searching, however, it raised a ValueError on singular matrices for no reason. So the correct fix should be to remove that. * Fixes issue 3367 This should work with the test case provided. * Adds missing argument to test * Style corrections to pass flake runner Implements the style corrections as mentioned in pull request scikit-learn#8328 * Adds entry for PR scikit-learn#8328 to what's new * Adds review changes from @jnothman
…lar covariance matrix (scikit-learn#8328) * Adds failing test for issue 3367 * Adds extra comments to describe what I'm testing Basically in this instance I discovered this bug independently from \scikit-learn#3367 because I was trying to use principle components to estimate plane normals / geometry of points in 3D space. When you have a set of points that specify a perfect plane though (or in the case of wanting to use MCD, you have a subset of your points that specify a perfect plane), then the code fails because the covariance matrix is singular. However, if your covariance matrix is singular, you've already found the set of points with the lowest determinant. As per Rousseeuw & Van Driessen 1999, at this point you can stop searching. The code did stop searching, however, it raised a ValueError on singular matrices for no reason. So the correct fix should be to remove that. * Fixes issue 3367 This should work with the test case provided. * Adds missing argument to test * Style corrections to pass flake runner Implements the style corrections as mentioned in pull request scikit-learn#8328 * Adds entry for PR scikit-learn#8328 to what's new * Adds review changes from @jnothman
…lar covariance matrix (scikit-learn#8328) * Adds failing test for issue 3367 * Adds extra comments to describe what I'm testing Basically in this instance I discovered this bug independently from \scikit-learn#3367 because I was trying to use principle components to estimate plane normals / geometry of points in 3D space. When you have a set of points that specify a perfect plane though (or in the case of wanting to use MCD, you have a subset of your points that specify a perfect plane), then the code fails because the covariance matrix is singular. However, if your covariance matrix is singular, you've already found the set of points with the lowest determinant. As per Rousseeuw & Van Driessen 1999, at this point you can stop searching. The code did stop searching, however, it raised a ValueError on singular matrices for no reason. So the correct fix should be to remove that. * Fixes issue 3367 This should work with the test case provided. * Adds missing argument to test * Style corrections to pass flake runner Implements the style corrections as mentioned in pull request scikit-learn#8328 * Adds entry for PR scikit-learn#8328 to what's new * Adds review changes from @jnothman
When the c_step of fast_mcd encounters a covariance matrix with a 0 determinant, it raises an error. According to the paper (Rousseeuw & Van Driessen 1999), it shouldn't be the case. Technically, 0-determinant covariance is the optimal solution. A singular case requires a special treatment. (Section 5 of the paper). I'm new here, but I can contribute to this.
The text was updated successfully, but these errors were encountered: