Bug: GMM score() returns an array, not a value.

The GMM.score() function returns an array, rather than a single value. This is inconsistent with the rest of scikit-learn: for example both sklearn.base.ClassifierMixin and sklearn.base.RegressorMixin implement a score() function which returns a single number, as do KMeans, KernelDensity, PCA, GaussianHMM, and others.

Currently, GMM.score() returns an array of the individual scores for each sample: this should probably be called GMM.score_samples(), and GMM.score() should return sum(GMM.score_samples()).

Note that in the last release, we renamed GMM.eval() to GMM.score_samples(). I believe this was a mistake: the score_samples label has a very general meaning (e.g. it is used within KernelDensity), while the results of GMM.eval() return a tuple containing the per-cluster likelihoods, which makes sense only with GMM.

If this change were made so that GMM.score() returned a single number, then the following recipe would work to optimize a GMM model (as it does for, e.g. KDE). As it is, this recipe fails for GMM:

import numpy as np
from sklearn.mixture import GMM
from sklearn.datasets import make_blobs
from sklearn.grid_search import GridSearchCV

X, y = make_blobs(100, 2, centers=3)

# use grid search cross-validation to optimize the gmm model
params = {'n_components': range(1, 5)}
grid = GridSearchCV(GMM(), params)
grid.fit(X)

print grid.best_estimator_.n_components

The result:

ValueError: scoring must return a number, got <type 'numpy.ndarray'> instead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions