8000 Added error messages in case user provides one dimensional data by Carldeboer · Pull Request #4845 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Added error messages in case user provides one dimensional data #4845

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

Carldeboer
Copy link

When trying to learn a GMM on one dimensional data, both weights and the data can be represented as 1-d vectors. However, this breaks GMM.fit() and it used to die with cryptic error messages. This addition specifically checks for one dimensional input and issues an error message accordingly.

…data (X)

When trying to learn a GMM on one dimensional data, both weights and the data can be represented as 1-d vectors.  However, this breaks GMM.fit() and it used to die with cryptic error messages.  This addition specifically checks for one dimensional input and issues an error accordingly.
@amueller
Copy link
Member

We should actually fix this for all estimator and test consistently. I'm not sure we want to include this fix. We should probably rather change check_array as discussed in #4511.

@Carldeboer
Copy link
Author

Agreed that that would be a better solution, but I don't think that would address problems that arise when self.means_ (or self.covars_) are not the right dimensions. Also, I'm encountering similar but different issues when using DPGMM, but after getting to predict:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/unix/cgdeboer/.local/lib/python2.7/site-packages/sklearn/mixture/gmm.py", line 357, in predict
    logprob, responsibilities = self.score_samples(X)
  File "/home/unix/cgdeboer/.local/lib/python2.7/site-packages/sklearn/mixture/dpgmm.py", line 274, in score_samples
    self.covariance_type)
  File "/home/unix/cgdeboer/.local/lib/python2.7/site-packages/sklearn/mixture/dpgmm.py", line 105, in _bound_state_log_lik
    bound[:, k] -= 0.5 * _sym_quad_form(X, means[k], precs[k])
  File "/home/unix/cgdeboer/.local/lib/python2.7/site-packages/sklearn/mixture/dpgmm.py", line 86, in _sym_quad_form
    q = (cdist(x, mu[np.newaxis], "mahalanobis", VI=A) ** 2).reshape(-1)
  File "/home/unix/cgdeboer/.local/lib/python2.7/site-packages/scipy-0.14.0b1-py2.7-linux-x86_64.egg/scipy/spatial/distance.py", line 1973, in cdist
    raise ValueError('XA and XB must have the same number of columns '
ValueError: XA and XB must have the same number of columns (i.e. feature dimension.)

So far, I'm unable to determine what the problem is this time. the DPGMM was made using the same 2-d data that succeeded with the GMM and the same command cluster = myMM.predict(np.reshape(bestAxis,(-1,1))) works for the GMM.

@amueller amueller added the Bug label Jun 11, 2015
@xuewei4d
Copy link
Contributor

There is a test case on 1-D dimension data, but the shape of the data is (n, 1). I agreed to fix the problem of using (n,) data in check_array.

For the problem of self.means_ you mentioned, did you set your own prior means_ before doing _fit? @Carldeboer

@Carldeboer
Copy link
Author

@xuewei4d Yes, that is indeed what I did. I wanted to initialize the means. If you have the means as a 1-d array, then GMM fails.

@xuewei4d
Copy link
Contributor

@Carldeboer I think your probably assign means_. Currently, GMM does not have public method to initialize the model parameters, means_, covers_ and so on. There is only one _set_var method. If you try that , it will throw an ValueError. I am working on refactoring GMM. That is within my concern.

@xuewei4d xuewei4d mentioned this pull request Jun 18, 2015
16 tasks
@amueller
Copy link
Member
amueller commented Sep 9, 2015

Arguably fixed via #5152. That will protect against 1d X. I would argue the user is in charge of setting means_ correctly according to the documentation.

@amueller amueller closed this Sep 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0