|
27 | 27 | increase.
|
28 | 28 |
|
29 | 29 | When using, for example, :ref:`cross validation <cross_validation>`, to
|
30 |
| -set amount of regularization with `C`, there will be a different |
| 30 | +set the amount of regularization with `C`, there will be a different |
31 | 31 | amount of samples between every problem that we are using for model
|
32 | 32 | selection, as well as for the final problem that we want to use for
|
33 | 33 | training.
|
|
38 | 38 | account for the different training samples?`
|
39 | 39 |
|
40 | 40 | The figures below are used to illustrate the effect of scaling our
|
41 |
| -`C` to compensate for the change in the amount of samples, in the |
| 41 | +`C` to compensate for the change in the number of samples, in the |
42 | 42 | case of using an `L1` penalty, as well as the `L2` penalty.
|
43 | 43 |
|
44 | 44 | L1-penalty case
|
45 | 45 | -----------------
|
46 | 46 | In the `L1` case, theory says that prediction consistency
|
47 | 47 | (i.e. that under given hypothesis, the estimator
|
48 | 48 | learned predicts as well as an model knowing the true distribution)
|
49 |
| -is not possible because of the biasof the `L1`. It does say, however, |
50 |
| -that model consistancy, in terms of finding the right set of non-zero |
| 49 | +is not possible because of the bias of the `L1`. It does say, however, |
| 50 | +that model consistency, in terms of finding the right set of non-zero |
51 | 51 | parameters as well as their signs, can be achieved by scaling
|
52 | 52 | `C1`.
|
53 | 53 |
|
|
64 | 64 | fractions of a generated data-set.
|
65 | 65 |
|
66 | 66 | In the `L1` penalty case, the results are best when scaling our `C` with
|
67 |
| -the amount of samples, `n`, which can be seen in the first figure. |
| 67 | +the number of samples, `n`, which can be seen in the first figure. |
68 | 68 |
|
69 | 69 | For the `L2` penalty case, the best result comes from the case where `C`
|
70 | 70 | is not scaled.
|
|
0 commit comments