8000 Bug: GMM ``score()`` returns an array, not a value. · Issue #2473 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

8000
Skip to content
Bug: GMM score() returns an array, not a value. #2473
Closed
@jakevdp

Description

@jakevdp

The GMM.score() function returns an array, rather than a single value. This is inconsistent with the rest of scikit-learn: for example both sklearn.base.ClassifierMixin and sklearn.base.RegressorMixin implement a score() function which returns a single number, as do KMeans, KernelDensity, PCA, GaussianHMM, and others.

Currently, GMM.score() returns an array of the individual scores for each sample: this should probably be called GMM.score_samples(), and GMM.score() should return sum(GMM.score_samples()).

Note that in the last release, we renamed GMM.eval() to GMM.score_samples(). I believe this was a mistake: the score_samples label has a very general meaning (e.g. it is used within KernelDensity), while the results of GMM.eval() return a tuple containing the per-cluster likelihoods, which makes sense only with GMM.

If this change were made so that GMM.score() returned a single number, then the following recipe would work to optimize a GMM model (as it does for, e.g. KDE). As it is, this recipe fails for GMM:

import numpy as np
from sklearn.mixture import GMM
from sklearn.datasets import make_blobs
from sklearn.grid_search import GridSearchCV

X, y = make_blobs(100, 2, centers=3)

# use grid search cross-validation to optimize the gmm model
params = {'n_components': range(1, 5)}
grid = GridSearchCV(GMM(), params)
grid.fit(X)

print grid.best_estimator_.n_components

The result:

ValueError: scoring must return a number, got <type 'numpy.ndarray'> instead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0