8000 Add n_components_ to SparsePCA · Issue #16752 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Add n_components_ to SparsePCA #16752
Closed
Closed
@krassowski

Description

@krassowski

PCA allows to retrieve the number of components with n_components_ attribute; this is, however, not possible with SparsePCA (both PCA and SparsePCA accept n_components argument).

Would it make sense to enable accessing n_components_ on PCA too? Please, note that this would be different from n_components, which is already available, but represents an unprocessed input argument, i.e. None if nothing was passed).

The current PCA behaviour:

from sklearn.decomposition import PCA, SparsePCA
from sklearn import datasets

iris = datasets.load_iris()
pca = PCA()
pca.fit(iris.data)
assert pca.n_components_ == 4
assert pca.n_components == None
assert len(pca.components_) == 4

pca_3 = PCA(n_components=3)
pca_3.fit(iris.data)
assert pca_3.n_components_ == 3
assert pca_3.n_components == 3
assert len(pca_3.components_) == 3

Existing SparsePCA behaviour:

spca = SparsePCA()
spca.fit(iris.data)
assert spca.n_components == None
assert len(spca.components_) == 4

spca_3 = SparsePCA(n_components=3)
spca_3.fit(iris.data)
assert spca_3.n_components == 3
assert len(spca_3.components_) == 3

Proposed SparsePCA behaviour:

assert spca.n_components_ == 4
assert spca_3.n_components_ == 3

This could also be added to KernelPCA and other PCA methods. Implementation-wise the code for calculating the number of components PCA could be generalised (this is replacing None with the actual number and/or trimming by the number of features or samples; I think that it might be placed _BasePCA, but actually neither SparsePCA nor KernelPCA descends from it). Is this the right direction?

On a related note, would make sense to have a computed property name n_non_trivial_components_ to give the number of components which have non-zero loadings?

Edit: a simple workaround is to use len(spca.components_), which works equally well for sparse and dense PCA - I am not sure of the addition of n_components_ is needed, but the point is that it would be great to have a consistent interface for all PCA methods!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0