8000 ENH refactor OVO decision function, use it in SVC for sklearn-like de… · scikit-learn/scikit-learn@0eb4cd8 · GitHub
[go: up one dir, main page]

Skip to content

Commit 0eb4cd8

Browse files
committed
ENH refactor OVO decision function, use it in SVC for sklearn-like decision_function shape
1 parent 4eda9e6 commit 0eb4cd8

File tree

13 files changed

+271
-131
lines changed

13 files changed

+271
-131
lines changed

doc/modules/model_persistence.rst

+4-3
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,10 @@ persistence model, namely `pickle <http://docs.python.org/library/pickle.html>`_
2222
>>> iris = datasets.load_iris()
2323
>>> X, y = iris.data, iris.target
2424
>>> clf.fit(X, y) # doctest: +NORMALIZE_WHITESPACE
25-
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
26-
kernel='rbf', max_iter=-1, probability=False, random_state=None,
27-
shrinking=True, tol=0.001, verbose=False)
25+
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
26+
decision_function_shape=None, degree=3, gamma=0.0, kernel='rbf',
27+
max_iter=-1, probability=False, random_state=None, shrinking=True,
28+
tol=0.001, verbose=False)
2829

2930
>>> import pickle
3031
>>> s = pickle.dumps(clf)

doc/modules/pipeline.rst

+6-6
Original file line numberDiff line numberDiff line change
@@ -42,9 +42,9 @@ is an estimator object::
4242
>>> clf # doctest: +NORMALIZE_WHITESPACE
4343
Pipeline(steps=[('reduce_dim', PCA(copy=True, n_components=None,
4444
whiten=False)), ('svm', SVC(C=1.0, cache_size=200, class_weight=None,
45-
coef0=0.0, degree=3, gamma=0.0, kernel='rbf', max_iter=-1,
46-
probability=False, random_state=None, shrinking=True, tol=0.001,
47-
verbose=False))])
45+
coef0=0.0, decision_function_shape=None, degree=3, gamma=0.0,
46+
kernel='rbf', max_iter=-1, probability=False, random_state=None,
47+
shrinking=True, tol=0.001, verbose=False))])
4848

4949
The utility function :func:`make_pipeline` is a shorthand
5050
for constructing pipelines;
@@ -76,9 +76,9 @@ Parameters of the estimators in the pipeline can be accessed using the
7676
>>> clf.set_params(svm__C=10) # doctest: +NORMALIZE_WHITESPACE
7777
Pipeline(steps=[('reduce_dim', PCA(copy=True, n_components=None,
7878
whiten=False)), ('svm', SVC(C=10, cache_size=200, class_weight=None,
79-
coef0=0.0, degree=3, gamma=0.0, kernel='rbf', max_iter=-1,
80-
probability=False, random_state=None, shrinking=True, tol=0.001,
81-
verbose=False))])
79+
coef0=0.0, decision_function_shape=None, degree=3, gamma=0.0,
80+
kernel='rbf', max_iter=-1, probability=False, random_state=None,
81+
shrinking=True, tol=0.001, verbose=False))])
8282

8383
This is particularly important for doing grid searches::
8484

doc/modules/svm.rst

+22-11
Original file line numberDiff line numberDiff line change
@@ -76,9 +76,10 @@ n_features]`` holding the training samples, and an array y of class labels
7676
>>> y = [0, 1]
7777
>>> clf = svm.SVC()
7878
>>> clf.fit(X, y) # doctest: +NORMALIZE_WHITESPACE
79-
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3,
80-
gamma=0.0, kernel='rbf', max_iter=-1, probability=False, random_state=None,
81-
shrinking=True, tol=0.001, verbose=False)
79+
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
80+
decision_function_shape=None, degree=3, gamma=0.0, kernel='rbf',
81+
max_iter=-1, probability=False, random_state=None, shrinking=True,
82+
tol=0.001, verbose=False)
8283

8384
After being fitted, the model can then be used to predict new values::
8485

@@ -109,18 +110,27 @@ Multi-class classification
109110
:class:`SVC` and :class:`NuSVC` implement the "one-against-one"
110111
approach (Knerr et al., 1990) for multi- class classification. If
111112
``n_class`` is the number of classes, then ``n_class * (n_class - 1) / 2``
112-
classifiers are constructed and each one trains data from two classes::
113+
classifiers are constructed and each one trains data from two classes.
114+
To provide a consistent interface with other classifiers, the
115+
``decision_function_shape`` option allows to aggregate the results of the
116+
"one-against-one" classifiers to a decision function of shape ``(n_samples,
117+
n_classes)``::
113118

114119
>>> X = [[0], [1], [2], [3]]
115120
>>> Y = [0, 1, 2, 3]
116-
>>> clf = svm.SVC()
121+
>>> clf = svm.SVC(decision_function_shape='ovo')
117122
>>> clf.fit(X, Y) # doctest: +NORMALIZE_WHITESPACE
118-
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3,
119-
gamma=0.0, kernel='rbf', max_iter=-1, probability=False, random_state=None,
120-
shrinking=True, tol=0.001, verbose=False)
123+
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
124+
decision_function_shape='ovo', degree=3, gamma=0.0, kernel='rbf',
125+
max_iter=-1, probability=False, random_state=None, shrinking=True,
126+
tol=0.001, verbose=False)
121127
>>> dec = clf.decision_function([[1]])
122128
>>> dec.shape[1] # 4 classes: 4*3/2 = 6
123129
6
130+
>>> clf.decision_function_shape = "ovr"
131+
>>> dec = clf.decision_function([[1]])
132+
>>> dec.shape[1] # 4 classes
133+
4
124134

125135
On the other hand, :class:`LinearSVC` implements "one-vs-the-rest"
126136
multi-class strategy, thus training n_class models. If there are only
@@ -503,9 +513,10 @@ test vectors must be provided.
503513
>>> # linear kernel computation
504514
>>> gram = np.dot(X, X.T)
505515
>>> clf.fit(gram, y) # doctest: +NORMALIZE_WHITESPACE
506-
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3,
507-
gamma=0.0, kernel='precomputed', max_iter=-1, probability=False,
508-
random_state=None, shrinking=True, tol=0.001, verbose=False)
516+
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
517+
decision_function_shape=None, degree=3, gamma=0.0,
518+
kernel='precomputed', max_iter=-1, probability=False,
519+
random_state=None, shrinking=True, tol=0.001, verbose=False)
509520
>>> # predict on training examples
510521
>>> clf.predict(gram)
511522
array([0, 1])

doc/tutorial/basic/tutorial.rst

+28-22
Original file line numberDiff line numberDiff line change
@@ -176,9 +176,10 @@ which produces a new array that contains all but
176176
the last entry of ``digits.data``::
177177

178178
>>> clf.fit(digits.data[:-1], digits.target[:-1]) # doctest: +NORMALIZE_WHITESPACE
179-
SVC(C=100.0, cache_size=200, class_weight=None, coef0=0.0, degree=3,
180-
gamma=0.001, kernel='rbf', max_iter=-1, probability=False,
181-
random_state=None, shrinking=True, tol=0.001, verbose=False)
179+
SVC(C=100.0, cache_size=200, class_weight=None, coef0=0.0,
180+
decision_function_shape=None, degree=3, gamma=0.001, kernel='rbf',
181+
max_iter=-1, probability=False, random_state=None, shrinking=True,
182+
tol=0.001, verbose=False)
182183

183184
Now you can predict new values, in particular, we can ask to the
184185
classifier what is the digit of our last image in the ``digits`` dataset,
@@ -214,9 +215,10 @@ persistence model, namely `pickle <http://docs.python.org/library/pickle.html>`_
214215
>>> iris = datasets.load_iris()
215216
>>> X, y = iris.data, iris.target
216217
>>> clf.fit(X, y) # doctest: +NORMALIZE_WHITESPACE
217-
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
218-
kernel='rbf', max_iter=-1, probability=False, random_state=None,
219-
shrinking=True, tol=0.001, verbose=False)
218+
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
219+
decision_function_shape=None, degree=3, gamma=0.0, kernel='rbf',
220+
max_iter=-1, probability=False, random_state=None, shrinking=True,
221+
tol=0.001, verbose=False)
220222

221223
>>> import pickle
222224
>>> s = pickle.dumps(clf)
@@ -287,18 +289,20 @@ maintained::
287289

288290
>>> iris = datasets.load_iris()
289291
>>> clf = SVC()
290-
>>> clf.fit(iris.data, iris.target)
291-
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
292-
kernel='rbf', max_iter=-1, probability=False, random_state=None,
293-
shrinking=True, tol=0.001, verbose=False)
292+
>>> clf.fit(iris.data, iris.target) # doctest: +NORMALIZE_WHITESPACE
293+
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
294+
decision_function_shape=None, degree=3, gamma=0.0, kernel='rbf',
295+
max_iter=-1, probability=False, random_state=None, shrinking=True,
296+
tol=0.001, verbose=False)
294297

295298
>>> list(clf.predict(iris.data[:3]))
296299
[0, 0, 0]
297300

298-
>>> clf.fit(iris.data, iris.target_names[iris.target])
299-
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
300-
kernel='rbf', max_iter=-1, probability=False, random_state=None,
301-
shrinking=True, tol=0.001, verbose=False)
301+
>>> clf.fit(iris.data, iris.target_names[iris.target]) # doctest: +NORMALIZE_WHITESPACE
302+
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
303+
decision_function_shape=None, degree=3, gamma=0.0, kernel='rbf',
304+
max_iter=-1, probability=False, random_state=None, shrinking=True,
305+
tol=0.001, verbose=False)
302306

303307
>>> list(clf.predict(iris.data[:3])) # doctest: +NORMALIZE_WHITESPACE
304308
['setosa', 'setosa', 'setosa']
@@ -324,17 +328,19 @@ more than once will overwrite what was learned by any previous ``fit()``::
324328
>>> X_test = rng.rand(5, 10)
325329

326330
>>> clf = SVC()
327-
>>> clf.set_params(kernel='linear').fit(X, y)
328-
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
329-
kernel='linear', max_iter=-1, probability=False, random_state=None,
330-
shrinking=True, tol=0.001, verbose=False)
331+
>>> clf.set_params(kernel='linear').fit(X, y) # doctest: +NORMALIZE_WHITESPACE
332+
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
333+
decision_function_shape=None, degree=3, gamma=0.0, kernel='linear',
334+
max_iter=-1, probability=False, random_state=None, shrinking=True,
335+
tol=0.001, verbose=False)
331336
>>> clf.predict(X_test)
332337
array([1, 0, 1, 1, 0])
333338

334-
>>> clf.set_params(kernel='rbf').fit(X, y)
335-
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
336-
kernel='rbf', max_iter=-1, probability=False, random_state=None,
337-
shrinking=True, tol=0.001, verbose=False)
339+
>>> clf.set_params(kernel='rbf').fit(X, y) # doctest: +NORMALIZE_WHITESPACE
340+
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
341+
decision_function_shape=None, degree=3, gamma=0.0, kernel='rbf',
342+
max_iter=-1, probability=False, random_state=None, shrinking=True,
343+
tol=0.001, verbose=False)
338344
>>> clf.predict(X_test)
339345
array([0, 0, 0, 1, 0])
340346

doc/tutorial/statistical_inference/supervised_learning.rst

+4-3
Original file line numberDiff line numberDiff line change
@@ -453,9 +453,10 @@ classification --:class:`SVC` (Support Vector Classification).
453453
>>> from sklearn import svm
454454
>>> svc = svm.SVC(kernel='linear')
455455
>>> svc.fit(iris_X_train, iris_y_train) # doctest: +NORMALIZE_WHITESPACE
456-
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
457-
kernel='linear', max_iter=-1, probability=False, random_state=None,
458-
shrinking=True, tol=0.001, verbose=False)
456+
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
457+
decision_function_shape=None, degree=3, gamma=0.0, kernel='linear',
458+
max_iter=-1, probability=False, random_state=None, shrinking=True,
459+
tol=0.001, verbose=False)
459460

460461

461462
.. warning:: **Normalizing data**

doc/whats_new.rst

+5
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,11 @@ API changes summary
6666
for retrieving the leaf indices samples are predicted as. By
6767
`Daniel Galvez`_ and `Gilles Louppe`_.
6868

69+
- :class:`svm.SVC`` and :class:`svm.NuSVC` now have an ``decision_function_shape``
70+
parameter to make their decision function of shape ``(n_samples, n_classes)``
71+
by setting ``decision_function_shape='ovr'``. This will be the default behavior
72+
starting in 0.19. By `Andreas Müller`_.
73+
6974
.. _changes_0_1_16:
7075

7176
0.16.1

sklearn/base.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,11 @@
1111
from .externals import six
1212

1313

14-
###############################################################################
14+
class ChangedBehaviorWarning(UserWarning):
15+
pass
16+
17+
18+
##############################################################################
1519
def clone(estimator, safe=True):
1620
"""Constructs a new estimator with the same parameters.
1721

sklearn/grid_search.py

+5-8
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
import numpy as np
2121

2222
from .base import BaseEstimator, is_classifier, clone
23-
from .base import MetaEstimatorMixin
23+
from .base import MetaEstimatorMixin, ChangedBehaviorWarning
2424
from .cross_validation import _check_cv as check_cv
2525
from .cross_validation import _fit_and_score
2626
from .externals.joblib import Parallel, delayed
@@ -304,10 +304,6 @@ def __repr__(self):
304304
self.parameters)
305305

306306

307-
class ChangedBehaviorWarning(UserWarning):
308-
pass
309-
310-
311307
class BaseSearchCV(six.with_metaclass(ABCMeta, BaseEstimator,
312308
MetaEstimatorMixin)):
313309
"""Base class for hyper parameter search with cross-validation."""
@@ -642,9 +638,10 @@ class GridSearchCV(BaseSearchCV):
642638
... # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
643639
GridSearchCV(cv=None, error_score=...,
644640
estimator=SVC(C=1.0, cache_size=..., class_weight=..., coef0=...,
645-
degree=..., gamma=..., kernel='rbf', max_iter=-1,
646-
probability=False, random_state=None, shrinking=True,
647-
tol=..., verbose=False),
641+
decision_function_shape=None, degree=..., gamma=...,
642+
kernel='rbf', max_iter=-1, probability=False,
643+
random_state=None, shrinking=True, tol=...,
644+
verbose=False),
648645
fit_params={}, iid=..., n_jobs=1,
649646
param_grid=..., pre_dispatch=..., refit=...,
650647
scoring=..., verbose=...)

sklearn/multiclass.py

+52-30
Original file line numberDiff line numberDiff line change
@@ -552,36 +552,58 @@ def decision_function(self, X):
552552
"""
553553
check_is_fitted(self, 'estimators_')
554554

555-
n_samples = X.shape[0]
556-
n_classes = self.classes_.shape[0]
557-
votes = np.zeros((n_samples, n_classes))
558-
sum_of_confidences = np.zeros((n_samples, n_classes))
559-
560-
k = 0
561-
for i in range(n_classes):
562-
for j in range(i + 1, n_classes):
563-
pred = self.estimators_[k].predict(X)
564-
confidence_levels_ij = _predict_binary(self.estimators_[k], X)
565-
sum_of_confidences[:, i] -= confidence_levels_ij
566-
sum_of_confidences[:, j] += confidence_levels_ij
567-
votes[pred == 0, i] += 1
568-
votes[pred == 1, j] += 1
569-
k += 1
570-
571-
max_confidences = sum_of_confidences.max()
572-
min_confidences = sum_of_confidences.min()
573-
574-
if max_confidences == min_confidences:
575-
return votes
576-
577-
# Scale the sum_of_confidences to (-0.5, 0.5) and add it with votes.
578-
# The motivation is to use confidence levels as a way to break ties in
579-
# the votes without switching any decision made based on a difference
580-
# of 1 vote.
581-
eps = np.finfo(sum_of_confidences.dtype).eps
582-
max_abs_confidence = max(abs(max_confidences), abs(min_confidences))
583-
scale = (0.5 - eps) / max_abs_confidence
584-
return votes + sum_of_confidences * scale
555+
predictions = np.vstack([est.predict(X) for est in self.estimators_]).T
556+
confidences = np.vstack([_predict_binary(est, X) for est in self.estimators_]).T
557+
return _ovr_decision_function(predictions, confidences,
558+
len(self.classes_))
559+
560+
561+
def _ovr_decision_function(predictions, confidences, n_classes):
562+
"""Compute a continuous, tie-breaking ovo decision function.
563+
564+
It is important to include a continuous value, not only votes,
565+
to make computing AUC or calibration meaningful.
566+
567+
Parameters
568+
----------
569+
predictions : array-like, shape (n_samples, n_classifiers)
570+
Predicted classes for each binary classifier.
571+
572+
confidences : array-like, shape (n_samples, n_classifiers)
573+
Decision functions or predicted probabilities for positive class
574+
for each binary classifier.
575+
576+
n_classes : int
577+
Number of classes. n_classifiers must be
578+
``n_classes * (n_classes - 1 ) / 2``
579+
"""
580+
n_samples = predictions.shape[0]
581+
votes = np.zeros((n_samples, n_classes))
582+
sum_of_confidences = np.zeros((n_samples, n_classes))
583+
584+
k = 0
585+
for i in range(n_classes):
586+
for j in range(i + 1, n_classes):
587+
sum_of_confidences[:, i] -= confidences[:, k]
588+
sum_of_confidences[:, j] += confidences[:, k]
589+
votes[predictions[:, k] == 0, i] += 1
590+
votes[predictions[:, k] == 1, j] += 1
591+
k += 1
592+
593+
max_confidences = sum_of_confidences.max()
594+
min_confidences = sum_of_confidences.min()
595+
596+
if max_confidences == min_confidences:
597+
return votes
598+
599+
# Scale the sum_of_confidences to (-0.5, 0.5) and add it with votes.
600+
# The motivation is to use confidence levels as a way to break ties in
601+
# the votes without switching any decision made based on a difference
602+
# of 1 vote.
603+
eps = np.finfo(sum_of_confidences.dtype).eps
604+
max_abs_confidence = max(abs(max_confidences), abs(min_confidences))
605+
scale = (0.5 - eps) / max_abs_confidence
606+
return votes + sum_of_confidences * scale
585607

586608

587609
@deprecated("fit_ecoc is deprecated and will be removed in 0.18."

0 commit comments

Comments
 (0)
0