|
27 | 27 | increase. |
28 | 28 |
|
29 | 29 | When using, for example, :ref:`cross validation <cross_validation>`, to |
30 | | -set amount of regularization with `C`, there will be a different |
| 30 | +set the amount of regularization with `C`, there will be a different |
31 | 31 | amount of samples between every problem that we are using for model |
32 | 32 | selection, as well as for the final problem that we want to use for |
33 | 33 | training. |
|
38 | 38 | account for the different training samples?` |
39 | 39 |
|
40 | 40 | The figures below are used to illustrate the effect of scaling our |
41 | | -`C` to compensate for the change in the amount of samples, in the |
| 41 | +`C` to compensate for the change in the number of samples, in the |
42 | 42 | case of using an `L1` penalty, as well as the `L2` penalty. |
43 | 43 |
|
44 | 44 | L1-penalty case |
45 | 45 | ----------------- |
46 | 46 | In the `L1` case, theory says that prediction consistency |
47 | 47 | (i.e. that under given hypothesis, the estimator |
48 | 48 | learned predicts as well as an model knowing the true distribution) |
49 | | -is not possible because of the biasof the `L1`. It does say, however, |
50 | | -that model consistancy, in terms of finding the right set of non-zero |
| 49 | +is not possible because of the bias of the `L1`. It does say, however, |
| 50 | +that model consistency, in terms of finding the right set of non-zero |
51 | 51 | parameters as well as their signs, can be achieved by scaling |
52 | 52 | `C1`. |
53 | 53 |
|
|
64 | 64 | fractions of a generated data-set. |
65 | 65 |
|
66 | 66 | In the `L1` penalty case, the results are best when scaling our `C` with |
67 | | -the amount of samples, `n`, which can be seen in the first figure. |
| 67 | +the number of samples, `n`, which can be seen in the first figure. |
68 | 68 |
|
69 | 69 | For the `L2` penalty case, the best result comes from the case where `C` |
70 | 70 | is not scaled. |
|
0 commit comments