You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/modules/linear_model.rst
+13-5Lines changed: 13 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -665,6 +665,8 @@ hyperparameters :math:`\lambda_1` and :math:`\lambda_2`.
665
665
:align:center
666
666
:scale:50%
667
667
668
+
ARD is also known in the literature as *Sparse Bayesian Learning* and
669
+
*Relevance Vector Machine* [3]_ [4]_.
668
670
669
671
.. topic:: Examples:
670
672
@@ -674,7 +676,13 @@ hyperparameters :math:`\lambda_1` and :math:`\lambda_2`.
674
676
675
677
.. [1] Christopher M. Bishop: Pattern Recognition and Machine Learning, Chapter 7.2.1
676
678
677
-
.. [2] David Wipf and Srikantan Nagarajan: `A new view of automatic relevance determination. <http://papers.nips.cc/paper/3372-a-new-view-of-automatic-relevance-determination.pdf>`_
679
+
.. [2] David Wipf and Srikantan Nagarajan: `A new view of automatic relevance determination <http://papers.nips.cc/paper/3372-a-new-view-of-automatic-relevance-determination.pdf>`_
680
+
681
+
.. [3] Michael E. Tipping: `Sparse Bayesian Learning and the Relevance Vector Machine <http://www.jmlr.org/papers/volume1/tipping01a/tipping01a.pdf>`_
The "lbfgs", "sag" and "newton-cg" solvers only support L2 penalization and
722
730
are found to converge faster for some high dimensional data. Setting
723
731
`multi_class` to "multinomial" with these solvers learns a true multinomial
724
-
logistic regression model [3]_, which means that its probability estimates
732
+
logistic regression model [5]_, which means that its probability estimates
725
733
should be better calibrated than the default "one-vs-rest" setting. The
726
734
"lbfgs", "sag" and "newton-cg"" solvers cannot optimize L1-penalized models,
727
735
therefore the "multinomial" setting does not learn sparse models.
728
736
729
-
The solver "sag" uses a Stochastic Average Gradient descent [4]_. It is faster
737
+
The solver "sag" uses a Stochastic Average Gradient descent [6]_. It is faster
730
738
than other solvers for large datasets, when both the number of samples and the
731
739
number of features are large.
732
740
@@ -778,9 +786,9 @@ entropy loss.
778
786
779
787
.. topic:: References:
780
788
781
-
.. [3] Christopher M. Bishop: Pattern Recognition and Machine Learning, Chapter 4.3.4
789
+
.. [5] Christopher M. Bishop: Pattern Recognition and Machine Learning, Chapter 4.3.4
782
790
783
-
.. [4] Mark Schmidt, Nicolas Le Roux, and Francis Bach: `Minimizing Finite Sums with the Stochastic Average Gradient. <http://hal.inria.fr/hal-00860051/PDF/sag_journal.pdf>`_
791
+
.. [6] Mark Schmidt, Nicolas Le Roux, and Francis Bach: `Minimizing Finite Sums with the Stochastic Average Gradient. <http://hal.inria.fr/hal-00860051/PDF/sag_journal.pdf>`_
0 commit comments