@@ -249,6 +249,10 @@ quadratic in the number of samples.
249
249
with a large number of training samples (> 10,000) for which the SGD
250
250
variant can be several orders of magnitude faster.
251
251
252
+ |details-start |
253
+ **Mathematical details **
254
+ |details-split |
255
+
252
256
Its implementation is based on the implementation of the stochastic
253
257
gradient descent. Indeed, the original optimization problem of the One-Class
254
258
SVM is given by
@@ -282,6 +286,8 @@ This is similar to the optimization problems studied in section
282
286
being the L2 norm. We just need to add the term :math: `b\nu ` in the
283
287
optimization loop.
284
288
289
+ |details-end |
290
+
285
291
As :class: `SGDClassifier ` and :class: `SGDRegressor `, :class: `SGDOneClassSVM `
286
292
supports averaged SGD. Averaging can be enabled by setting ``average=True ``.
287
293
@@ -410,6 +416,10 @@ where :math:`L` is a loss function that measures model (mis)fit and
410
416
complexity; :math: `\alpha > 0 ` is a non-negative hyperparameter that controls
411
417
the regularization strength.
412
418
419
+ |details-start |
420
+ **Loss functions details **
421
+ |details-split |
422
+
413
423
Different choices for :math: `L` entail different classifiers or regressors:
414
424
415
425
- Hinge (soft-margin): equivalent to Support Vector Classification.
@@ -431,6 +441,8 @@ Different choices for :math:`L` entail different classifiers or regressors:
431
441
- Epsilon-Insensitive: (soft-margin) equivalent to Support Vector Regression.
432
442
:math: `L(y_i, f(x_i)) = \max (0 , |y_i - f(x_i)| - \varepsilon )`.
433
443
444
+ |details-end |
445
+
434
446
All of the above loss functions can be regarded as an upper bound on the
435
447
misclassification error (Zero-one loss) as shown in the Figure below.
436
448
0 commit comments