-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
undesirable behavior from check_array function when passed a 1D numpy array in 0.16.1 #4877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report. Currently support for 1d input is in inconsistent, though it shouldn't raise that bad an error. In the future, giving 1d vectors will always raise an error, telling you to do |
So my impromptu fix to the example was the correct way to go about it. Good to know. I'm using Anaconda with scikit-learn 0.16.1 and Python 3.4.3 |
@xuewei4d maybe make sure that this doesn't happen in the new implementation... I am surprised that we give such a bad error :-/ |
Sure. @amueller I repeated this code. @JLConawayII, correct me if I am wrong. gmm If |
Yes that looks right. When it returns the array the dimensions are switched. To me it seems like a strange choice to have the function work this way. Personally I wouldn't have a validation tool change the input array at all, but check_array seems to do several things at once. It seems especially unnecessary in this case, since immediately after the check_array function is called in score_samples there is code to convert the 1D array into a (n,1) array: gmm.py Lines 310-311 add an axis to the 1D array if necessary, and then lines 314-315 check for a shape mismatch. I'm not really sure what the intention was here. |
This was an oversight on my part when I introduced the check_array function. It made many things much simpler, but it seems I overlooked this check (I had to edit all files). We are now switching to raising an error whenever a 1d array is passed, with a deprecation cycle. |
@amueller Okay. |
So is there a resolution for this issue? Should a more specific issue be created? |
Yes. I think so. Or we create another PR to fix it? @jnothman |
Yes. Well the fixed version of #4511. The current version is bs. |
Fixed via #5152. |
I'm going to be using Gaussian Mixture Models for my research and I thought I would input some examples to see how the package worked. When I tried running the 1D Gaussian Mixture Example located here http://www.astroml.org/book_figures/chapter4/fig_GMM_1D.html it kicked back this error in Pycharm:
Traceback (most recent call last):
File "/home/jconaway/Research/Kepler_Analysis_2/gaussian_mixture_example.py", line 86, in
logprob, responsibilities = M_best.score_samples(x)
File "/home/jconaway/anaconda3/lib/python3.4/site-packages/sklearn/mixture/gmm.py", line 315, in score_samples
raise ValueError('The shape of X is not compatible with self')
ValueError: The shape of X is not compatible with self
Process finished with exit code 1
I figured it should have worked as-is, so I did some exploring as to why it wasn't working correctly. I found that when M_best.score_samples(x) reaches X = check_array(X) on line 309 of gmm.py, it looks like it doesn't return the correct array. Here's some doodling around I did with some arbitrary arrays:
In [1]: import numpy as np
In [2]: x = np.linspace(-3,3,7)
In [3]: y = x[:,np.newaxis]
In [4]: x
Out[4]: array([-3., -2., -1., 0., 1., 2., 3.])
In [5]: y
Out[5]:
array([[-3.],
[-2.],
[-1.],
[ 0.],
[ 1.],
[ 2.],
[ 3.]])
In [6]: y.shape[1]
Out[6]: 1
In [7]: from sklearn.utils import validation
In [8]: Q = np.array([1,2,3,4,5])
In [9]: q = validation.check_array(Q)
In [10]: q
Out[10]: array([[1, 2, 3, 4, 5]])
In [11]: s = q[:,np.newaxis]
In [12]: s
Out[12]: array([[[1, 2, 3, 4, 5]]])
When I went back to the example and changed it to this:
s = x[:,np.newaxis]
logprob, responsibilities = M_best.score_samples(s)
it worked fine. That's as far as I've gotten with this. Hope it's helpful.
The text was updated successfully, but these errors were encountered: