-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Array dimension issue when using sklearn.covariance.fast_mcd #4512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'd submit a PR pasting these arguments in header in order to see how the
patch looks like with tests fixed. Just to know what it takes to do what
you want.
|
The fix itself does not appear to break anything as per PR #4517. If you choose to merge it, feel free to close this issue. Otherwise, please let me know if there is any additional work necessary to incorporate this fix. Thanks again! :) |
As I said in the PR, I think the right fix is this: #4511 which should fix it across scikit-learn. |
sorry I missed it. Let's get #4511 done then.
|
Unfortunately this didn't get fixed in #5152, as this is inside a function. High time we have common tests for functions [or deprecate them?]. |
Fixed by @vighneshbirodkar in #5234. |
Lines 362-368 in
sklearn/covariance/robust_covariance.py
specify the following:The problem with this is that if you pass in a 1D array of shape (n_samples,), you typically want the univariate estimate of the MCD for all those samples (hence you actually wanted to pass in an array of shape (n_samples, 1)). However, the code above assumes you really wanted to pass in a 1D array of shape (1, n_features), which has the following problems:
), which contains the reference implementation of FastMCD by Van Dreissen and Rombouts. This can be found in the
mcdcov.m
file, lines 213-215.Because of these reasons, I'm going to assume this is a bug, which appears similar to the bugs found in #4509 and #4466. The fix is simple, just change the above lines to the following:
With this fix, finding univariate estimates of the MCD becomes much easier. I have made the above changes to my fork at https://github.com/ThatGeoGuy/scikit-learn and can submit a pull request at any time. However, while running nosetests, I could not correctly get the tests to complete. I also could not find any documentation mentioning how to run the tests so
sklearn.__check_build
will run appropriately.Any advice is appreciated. I am examining the
MinCovDet
/fast_mcd
so that I can hopefully fix issue #3367, which is currently preventing me from completing a project I am working on.The text was updated successfully, but these errors were encountered: