-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Add 2D2PCA ( Two-directional two-dimensional PCA) #1503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It seems that new estimator should handle sparse matrix before passing the travis tests. I don't know why. The PCA2D class expects a 3D array and scipy.sparse has only 2D matrices. Is there a way to use 3D sparse matrix in scikit-learn ? Otherwise for the PCA2D class with sparse matrix I will be oblige to assume that the size of the third dimension is 1. By assuming this PCA2D will become similar to SparsePCA |
It is ok for an algorithm to not support sparse matrices. The tests check whether the algorithm either supports sparse matrices or gives an informative error. I guess somehow the test doesn't like the way you gave an error but I'd have to investigate. More generally: I'm not familar with the algorithm you implemented. Is it specific to images? We don't really have image-specific algorithms, these are more appropriate for scikit-image. |
OK. I understand now. If the algorithm does not support sparse matrices it should raise a TypeError Exception. And the error message should contain the word "sparse" (File "sklearn/tests/test_common.py", line 269) No the algorithm is not specific to image although many applications will use images. It can be applied to any 2d/1d data. |
Ok, thanks for your comment.
Maybe you can comment at least on the first point, I'd like to hear others opinion on this. |
What is the reference paper? A good criterion to judge if a paper is a good candidate for addition to scikit-learn is the number of citations of the reference paper. |
OK. -- 2D PCA algorithm is almost the same as PCA ( the methodology, justification .. are the same). So anyone using PCA can use 2D PCA. In fact as I have implemented it 2D PCA can receive 2D data. In this case it will output almost the same results as PCA -- In many cases (especially for images), in order to use PCA, we reshape every sample to 1d data. So for example, if we have 500 samples of dimensions 1000 x1000, we reshape it to 500 x 1 000 000 matrix before applying PCA. With (standard) PCA it is really slow : SVD on 1 000 000 x 1 000 000 matrix or on 500 x 1 000 000 matrix ( depending on the implementation). 2D PCA consider directly the 500 x 1000 x 1000 samples data matrix. So the SVD used 1000 x 1000 matrix and is more efficient than PCA. In conclusion I would say that 2D PCA is well known, as ( more ) useful as ( than) PCA. The implementation support 2D or 3D array. In case of 2D array 2D PCA = PCA. It should also be possible to used a faster svd implementation for 2D PCA. For example we could used the randomized_svd and get a Randomized 2D PCA. |
References papers : 2- Yang, Jian and Zhang, David and Frangi, Alejandro F. and Yang, Jing Y. : Two-dimensional PCA: a new approach to appearance-based face representation and recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 26, No. 1. (2004), pp. 131-137 http://repository.lib.polyu.edu.hk/jspui/bitstream/10397/190/1/137.pdf 1725 citations |
In terms of number of citations, this seems relevant for inclusion in scikit-learn. Can you add the two paper references to the doctring? Have a look at other files for an example. |
This technique is based on 2D matrices as opposed to standard PCA which | ||
is based on 1D vectors. It considers simultaneously the row and column | ||
directions.2D PCA is computed by examining the covariance of the data. | ||
It can easily get higher accuracy than 1D PCA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PCA is unsupervised so what you mean by accuracy is unclear...
Quickly browsing the code, I think that this looks like a good candidate for inclusion. You will need to fix style mistakes and create documentation in the user guide (what we call "narrative documentation"). Before investigating time on the documentation, you can wait to see if other people are also ok with the idea of adding |
@amueller Many matrix factorization problems have been extended to more general tensor factorization. I think it could be nice to support some in scikit-learn. Grid search and cross-validation should a priori be fine since they work on the first axis. For pipelines, we will need transformers for flattening the 3d array to 2d and vice-versa. |
I'm undecided in this but wouldn't oppose. Btw @mblondel if you have some time I'd really appreciate your input on #1485 and #1491. |
I have a question about PCA/SparsePCA .... with implication to PCA2D |
@yedtoss this should probably be consistent in the whole scikit-learn. I would imagine it is not at the moment. |
@amueller Let's name v1 using shape (n_samples =n , 1) and v2 using shape (1, n_features = n). Then PCA, RandomizedPCA, ProbabilisticPCA, ProjectedGradientNMF, NMF, FactorAnalysis, SparseCoder, DictionaryLearning, MiniBatchDictionaryLearning, FastICA, LDA uses v1 |
Why do you think it make more sense? |
I disagree with you when you said that both don't make sense. I am talking about the -- I have used n_samples of data to find the principal components of all my data -- Then I may want to reduce the dimensions of n data where n = 1. In other words I may want to reduce only one data. If one data is a 1d array of shape n_features, then I can send to the |
On 01/04/2013 12:46 AM, yedtoss wrote:
|
Yes PCA2D should interpret it in the |
Can I now write the narrative documentation for 2DPCA? |
i am still unsure about the inclusion of 2d2pca. maybe we should discuss on the mailing list. yedtoss notifications@github.com schrieb:
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet. |
Yes I agree. It should be consistent in all estimators. |
@yedtoss Could you create a github repository or gist and add it to https://github.com/scikit-learn/scikit-learn/wiki/Third-party-projects-and-code-snippets? |
OK. |
closing as out of scope. Adding to gists / link on website etc still welcome. |
I add the Two-directional two-dimensional PCA in sklearn/decomposition/pca_2d.py
Tests are added in sklearn/decomposition/tests/test_pca_2d.py
and examples are in examples/applications/face_recognition_pca_2d.py
Reference :
Daoqiang Zhang, Zhi-Hua Zhou, : Two-directional two-dimensional PCA for efficient face representation and recognition, Neurocomputing, Volume 69, Issues 1-3, December 2005, Pages 224-231