8000 TruncatedSVD option to reduce k instead of raising "n_components must be < n_features;" · Issue #17916 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
TruncatedSVD option to reduce k instead of raising "n_components must be < n_features;" #17916
Open
@zachmayer

Description

@zachmayer

Describe the workflow you want to enable

If I'm using TruncatedSVD in a pipeline, it'd be nice to have an option to automatically set n_components < n_features, if n_components >= n_features.

For example, in the docs for sklearn.manifold.TSNE suggest using TruncatedSVD to limit the dimensionality of the input to 50. This is easy to do with a pipeline:

from sklearn.decomposition import TruncatedSVD
from sklearn.manifold import TSNE
from sklearn.pipeline import make_pipeline
from scipy.sparse import random

tsne = make_pipeline(TruncatedSVD(n_components=50), TSNE(n_iter=250))

wide_data = random(1000, 100)
wide_tsne = tsne.fit_transform(wide_data)

However, let's say later we get a narrower dataset:

narrow_data = random(1000, 10)
narrow_tsne = tsne.fit_transform(narrow_data)

This raises the error: n_components must be < n_features; got 50 >= 10

Describe your proposed solution

I'd like to add a parameter to the __init__ for TruncatedSVD, with a name like excess_n_components. The default will be something like excess_n_components="error" which will preserve the current behaivor.

However, if excess_n_components="reduce_n_components" (or some other good way to specify it), at fit time we'd automatically reset n_components to X.shape[1]-1. (With maybe a special case for when X.shape[1]==1?)

Describe alternatives you've considered, if relevant

Writing a wrapper for Pipeline that generates different pipelines depending on the source data. This gets difficult; however, if the pipeline contains intermediate steps that may increase or decrease the dimensionality of the data.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0