8000 Scaling features using MinMaxScaler makes DPGMM always have one cluster · Issue #6694 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Scaling features using MinMaxScaler makes DPGMM always have one cluster #6694
Closed
@HTCode

Description

@HTCode

I have noticed that if I scale my dataset using MinMaxScaler() then if I use DPGMM with whatever value for alpha, it will always create one cluster (label). This might be related to some numerical precision issue.

If I don't rescale the data or if instead of the MinMaxScaler() I use StandardScaler(), then this problem does not occur (i.e., the DPGMM creates more than one cluster).

Is this a bug in the sklearn.mixture.DPGMM or did I miss something ?

API is here: http://scikit-learn.org/stable/modules/generated/sklearn.mixture.DPGMM.html#sklearn.mixture.DPGMM

I have also tried on the artificial data in this example (from the official site) : http://scikit-learn.org/stable/auto_examples/mixture/plot_gmm.html#example-mixture-plot-gmm-py

It works, but if I resale the generated dataset X by adding the following line, then the DPGMM will create only one cluster:

from sklearn.preprocessing import MinMaxScaler
X = MinMaxScaler().fit_transform(X) # added this after X ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0