@@ -31,8 +31,8 @@ The disadvantages of support vector machines include:
31
31
samples, the method is likely to give poor performances.
32
32
33
33
- SVMs do not directly provide probability estimates, these are
34
- calculated using five-fold cross-validation, and thus
35
- performance can suffer .
34
+ calculated using an expensive five-fold cross-validation
35
+ (see :ref: ` Scores and probabilities < _scores_probabilities >`, below) .
36
36
37
37
The support vector machines in scikit-learn support both dens
38
38
(``numpy.ndarray `` and convertible to that by ``numpy.asarray ``) and
@@ -41,7 +41,6 @@ an SVM to make predictions for sparse data, it must have been fit on such
41
41
data. For optimal performance, use C-ordered ``numpy.ndarray `` (dense) or
42
42
``scipy.sparse.csr_matrix `` (sparse) with ``dtype=float64 ``.
43
43
44
- .. TODO: add reference to probability estimates
45
44
46
45
.. _svm_classification :
47
46
@@ -196,6 +195,30 @@ this:
196
195
+------------------------+------------------------+------------------+
197
196
198
197
198
+ .. _scores_probabilities :
199
+
200
+ Scores and probabilities
201
+ ------------------------
202
+
203
+ The :class: `SVC ` method ``decision_function `` gives per-class scores
204
+ for each sample (or a single score per sample in the binary case).
205
+ When the constructor option ``probability `` is set to ``True ``,
206
+ class membership probability estimates
207
+ (from the methods ``predict_proba `` and ``predict_log_proba ``) are enabled.
208
+ In the binary case, the probabilities are calibrated using Platt's method:
209
+ logistic regression on the SVM's scores,
210
+ fit by an additional cross-validation on the training data.
211
+ Needless to say, this is an expensive operation for large datasets.
212
+ In the multiclass case, this is extended as per Wu et al. (2004).
213
+
214
+ .. topic :: References:
215
+
216
+ * Wu, Lin and Weng,
217
+ `"Probability estimates for multi-class classification by pairwise coupling"
218
+ <http://www.csie.ntu.edu.tw/~cjlin/papers/svmprob/svmprob.pdf> `_.
219
+ JMLR 5:975-1005, 2004.
220
+
221
+
199
222
Unbalanced problems
200
223
--------------------
201
224
0 commit comments