8000 DOC describe SVM probability calibration (and advise against it) · nullnotfound/scikit-learn@b1a97de · GitHub
[go: up one dir, main page]

Skip to content

Commit b1a97de

Browse files
committed
DOC describe SVM probability calibration (and advise against it)
1 parent cb7ba3a commit b1a97de

File tree

1 file changed

+26
-3
lines changed

1 file changed

+26
-3
lines changed

doc/modules/svm.rst

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,8 @@ The disadvantages of support vector machines include:
3131
samples, the method is likely to give poor performances.
3232

3333
- SVMs do not directly provide probability estimates, these are
34-
calculated using five-fold cross-validation, and thus
35-
performance can suffer.
34+
calculated using an expensive five-fold cross-validation
35+
(see :ref:`Scores and probabilities <_scores_probabilities>`, below).
3636

3737
The support vector machines in scikit-learn support both dens
3838
(``numpy.ndarray`` and convertible to that by ``numpy.asarray``) and
@@ -41,7 +41,6 @@ an SVM to make predictions for sparse data, it must have been fit on such
4141
data. For optimal performance, use C-ordered ``numpy.ndarray`` (dense) or
4242
``scipy.sparse.csr_matrix`` (sparse) with ``dtype=float64``.
4343

44-
.. TODO: add reference to probability estimates
4544

4645
.. _svm_classification:
4746

@@ -196,6 +195,30 @@ this:
196195
+------------------------+------------------------+------------------+
197196

198197

198+
.. _scores_probabilities:
199+
200+
Scores and probabilities
201+
------------------------
202+
203+
The :class:`SVC` method ``decision_function`` gives per-class scores
204+
for each sample (or a single score per sample in the binary case).
205+
When the constructor option ``probability`` is set to ``True``,
206+
class membership probability estimates
207+
(from the methods ``predict_proba`` and ``predict_log_proba``) are enabled.
208+
In the binary case, the probabilities are calibrated using Platt's method:
209+
logistic regression on the SVM's scores,
210+
fit by an additional cross-validation on the training data.
211+
Needless to say, this is an expensive operation for large datasets.
212+
In the multiclass case, this is extended as per Wu et al. (2004).
213+
214+
.. topic:: References:
215+
216+
* Wu, Lin and Weng,
217+
`"Probability estimates for multi-class classification by pairwise coupling"
218+
<http://www.csie.ntu.edu.tw/~cjlin/papers/svmprob/svmprob.pdf>`_.
219+
JMLR 5:975-1005, 2004.
220+
221+
199222
Unbalanced problems
200223
--------------------
201224

0 commit comments

Comments
 (0)
0