TruncatedSVD option to reduce k instead of raising "n_components must be < n_features;"

Describe the workflow you want to enable

If I'm using TruncatedSVD in a pipeline, it'd be nice to have an option to automatically set n_components < n_features, if n_components >= n_features.

For example, in the docs for sklearn.manifold.TSNE suggest using TruncatedSVD to limit the dimensionality of the input to 50. This is easy to do with a pipeline:

from sklearn.decomposition import TruncatedSVD
from sklearn.manifold import TSNE
from sklearn.pipeline import make_pipeline
from scipy.sparse import random

tsne = make_pipeline(TruncatedSVD(n_components=50), TSNE(n_iter=250))

wide_data = random(1000, 100)
wide_tsne = tsne.fit_transform(wide_data)

However, let's say later we get a narrower dataset:

narrow_data = random(1000, 10)
narrow_tsne = tsne.fit_transform(narrow_data)

This raises the error: n_components must be < n_features; got 50 >= 10

Describe your proposed solution

I'd like to add a parameter to the __init__ for TruncatedSVD, with a name like excess_n_components. The default will be something like excess_n_components="error" which will preserve the current behaivor.

However, if excess_n_components="reduce_n_components" (or some other good way to specify it), at fit time we'd automatically reset n_components to X.shape[1]-1. (With maybe a special case for when X.shape[1]==1?)

Describe alternatives you've considered, if relevant

Writing a wrapper for Pipeline that generates different pipelines depending on the source data. This gets difficult; however, if the pipeline contains intermediate steps that may increase or decrease the dimensionality of the data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions