8000 gael`s suggestions/tweaks · seckcoder/scikit-learn@254b109 · GitHub
[go: up one dir, main page]

Skip to content

Commit 254b109

Browse files
jaquesgrobleramueller
authored andcommitted
gael`s suggestions/tweaks
1 parent 6b04635 commit 254b109

File tree

1 file changed

+37
-39
lines changed

1 file changed

+37
-39
lines changed

examples/svm/plot_svm_scale_c.py

Lines changed: 37 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44
=========================================================================
55
66
The following example illustrates the effect of scaling the
7-
regularization parameter when using :ref:`svm` for
8-
:ref:`classification <svm_classification>`.
7+
regularization parameter when using :ref:`svm` for
8+
:ref:`classification <svm_classification>`.
99
For SVC classification, we are interested in a risk minimization for the
1010
equation:
1111
@@ -21,35 +21,35 @@
2121
and our model parameters.
2222
- :math:`\Omega` is a `penalty` function of our model parameters
2323
24-
If we consider the :math:`\mathcal{L}` function to be the individual error per
25-
sample, then the data-fit term, or the sum of the error for each sample, will
26-
increase as we add more samples. The penalization term, however, will not
24+
If we consider the loss function to be the individual error per
25+
sample, then the data-fit term, or the sum of the error for each sample, will
26+
increase as we add more samples. The penalization term, however, will not
2727
increase.
2828
2929
When using, for example, :ref:`cross validation <cross_validation>`, to
30-
set amount of regularization with :math:`C`, there will be a different
31-
amount of samples between every problem that we are using for model
32-
selection, as well as for the final problem that we want to use for
30+
set amount of regularization with `C`, there will be a different
31+
amount of samples between every problem that we are using for model
32+
selection, as well as for the final problem that we want to use for
3333
training.
3434
3535
Since our loss function is dependant on the amount of samples, the latter
36-
will influence the selected value of :math:`C`.
36+
will influence the selected value of `C`.
3737
The question that arises is `How do we optimally adjust C to
3838
account for the different training samples?`
3939
4040
The figures below are used to illustrate the effect of scaling our
41-
:math:`C` to compensate for the change in the amount of samples, in the
42-
case of using an :math:`L1` penalty, as well as the :math:`L2` penalty.
41+
`C` to compensate for the change in the amount of samples, in the
42+
case of using an `L1` penalty, as well as the `L2` penalty.
4343
4444
L1-penalty case
4545
-----------------
46-
In the :math:`L1` case, theory says that prediction consistency
46+
In the `L1` case, theory says that prediction consistency
4747
(i.e. that under given hypothesis, the estimator
48-
learned predicts as well as an model knowing the true distribution)
49-
is not possible because of the biasof the :math:`L1`. It does say, however,
48+
learned predicts as well as an model knowing the true distribution)
49+
is not possible because of the biasof the `L1`. It does say, however,
5050
that model consistancy, in terms of finding the right set of non-zero
51-
parameters as well as their signs, can be achieved by scaling
52-
:math:`C1`.
51+
parameters as well as their signs, can be achieved by scaling
52+
`C1`.
5353
5454
L2-penalty case
5555
-----------------
@@ -59,17 +59,21 @@
5959
Simulations
6060
------------
6161
62-
The two figures below plot the values of :math:`C` on the `x-axis` and the
62+
The two figures below plot the values of `C` on the `x-axis` and the
6363
corresponding cross-validation scores on the `y-axis`, for several different
6464
fractions of a generated data-set.
6565
66-
In the :math:`L1` penalty case, the results are best when scaling our :math:`C` with
66+
In the `L1` penalty case, the results are best when scaling our `C` with
6767
the amount of samples, `n`, which can be seen in the third plot of the first figure.
6868
69-
For the :math:`L2` penalty case, the best result comes from the case where :math:`C`
69+
For the `L2` penalty case, the best result comes from the case where `C`
7070
is not scaled.
7171
72+
.. topic:: Note:
7273
74+
Two seperate datasets are used for the two different plots. The reason
75+
behind this is the `L1` case works better on sparse data, while `L2`
76+
is better suited to the non-sparse case.
7377
"""
7478
print __doc__
7579

@@ -94,32 +98,29 @@
9498
# set up dataset
9599
n_samples = 100
96100
n_features = 300
97-
101+
98102
#L1 data (only 5 informative features)
99103
X_1, y_1 = datasets.make_classification(n_samples=n_samples, n_features=n_features,
100104
n_informative=5, random_state=1)
101-
105+
102106
#L2 data: non sparse, but less features
103107
y_2 = np.sign(.5 - rnd.rand(n_samples))
104108
X_2 = rnd.randn(n_samples, n_features/5) + y_2[:, np.newaxis]
105109
X_2 += 5 * rnd.randn(n_samples, n_features/5)
106-
107-
clf_sets = [(LinearSVC(penalty='L1', loss='L2', dual=False,
110+
111+
clf_sets = [(LinearSVC(penalty='L1', loss='L2', dual=False,
108112
tol=1e-3),
109113
np.logspace(-2.2, -1.2, 10), X_1, y_1),
110-
(LinearSVC(penalty='L2', loss='L2', dual=True,
114+
(LinearSVC(penalty='L2', loss='L2', dual=True,
111115
tol=1e-4),
112116
np.logspace(-4.5, -2, 10), X_2, y_2)]
113-
117+
114118
colors = ['b', 'g', 'r', 'c']
115119

116120
for fignum, (clf, cs, X, y) in enumerate(clf_sets):
117121
# set up the plot for each regressor
118122
pl.figure(fignum, figsize=(9, 10))
119-
pl.clf
120-
pl.xlabel('C')
121-
pl.ylabel('CV Score')
122-
123+
123124
for k, train_size in enumerate(np.linspace(0.3, 0.7, 3)[::-1]):
124125
param_grid = dict(C=cs)
125126
# To get nice curve, we need a large number of iterations to
@@ -129,23 +130,20 @@
129130
n_iterations=250, random_state=1))
130131
grid.fit(X, y)
131132
scores = [x[1] for x in grid.grid_scores_]
132-
133-
scales = [(1, 'No scaling'),
134-
((n_samples * train_size), '1/n_samples'),
133+
134+
scales = [(1, 'No scaling'),
135+
((n_samples * train_size), '1/n_samples'),
135136
]
136137

137138
for subplotnum, (scaler, name) in enumerate(scales):
138139
pl.subplot(2, 1, subplotnum + 1)
139-
grid_cs = cs * float(scaler) # scale the C's
140+
pl.xlabel('C')
141+
pl.ylabel('CV Score')
142+
grid_cs = cs * float(scaler) # scale the C's
140143
pl.semilogx(grid_cs, scores, label="fraction %.2f" %
141144
train_size)
142145
pl.title('scaling=%s, penalty=%s, loss=%s' % (name, clf.penalty, clf.loss))
143146

144-
#ymin, ymax = pl.ylim()
145-
#pl.axvline(grid_cs[np.argmax(scores)], 0, 1,
146-
# color=colors[k])
147-
#pl.ylim(ymin=ymin-0.0025, ymax=ymax+0.008) # adjust the y-axis
148-
149147
pl.legend(loc="best")
150148
pl.show()
151-
149+

0 commit comments

Comments
 (0)
0