8000 Merge pull request #6549 from FlorianWilhelm/ardregression_doc_addition · kernc/scikit-learn@113ee40 · GitHub < 8000 /head>
[go: up one dir, main page]

Skip to content

Commit 113ee40

Browse files
committed
Merge pull request scikit-learn#6549 from FlorianWilhelm/ardregression_doc_addition
[MRG] Added additional references for ARDRegression
2 parents 195a5eb + 681fd2f commit 113ee40

File tree

1 file changed

+13
-5
lines changed

1 file changed

+13
-5
lines changed

doc/modules/linear_model.rst

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -665,6 +665,8 @@ hyperparameters :math:`\lambda_1` and :math:`\lambda_2`.
665665
:align: center
666666
:scale: 50%
667667

668+
ARD is also known in the literature as *Sparse Bayesian Learning* and
669+
*Relevance Vector Machine* [3]_ [4]_.
668670

669671
.. topic:: Examples:
670672

@@ -674,7 +676,13 @@ hyperparameters :math:`\lambda_1` and :math:`\lambda_2`.
674676

675677
.. [1] Christopher M. Bishop: Pattern Recognition and Machine Learning, Chapter 7.2.1
676678
677-
.. [2] David Wipf and Srikantan Nagarajan: `A new view of automatic relevance determination. <http://papers.nips.cc/paper/3372-a-new-view-of-automatic-relevance-determination.pdf>`_
679+
.. [2] David Wipf and Srikantan Nagarajan: `A new view of automatic relevance determination <http://papers.nips.cc/paper/3372-a-new-view-of-automatic-relevance-determination.pdf>`_
680+
681+
.. [3] Michael E. Tipping: `Sparse Bayesian Learning and the Relevance Vector Machine <http://www.jmlr.org/papers/volume1/tipping01a/tipping01a.pdf>`_
682+
683+
.. [4] Tristan Fletcher: `Relevance Vector Machines explained <http://www.tristanfletcher.co.uk/RVM%20Explained.pdf>`_
684+
685+
678686
679687
.. _Logistic_regression:
680688

@@ -721,12 +729,12 @@ weights to zero) model.
721729
The "lbfgs", "sag" and "newton-cg" solvers only support L2 penalization and
722730
are found to converge faster for some high dimensional data. Setting
723731
`multi_class` to "multinomial" with these solvers learns a true multinomial
724-
logistic regression model [3]_, which means that its probability estimates
732+
logistic regression model [5]_, which means that its probability estimates
725733
should be better calibrated than the default "one-vs-rest" setting. The
726734
"lbfgs", "sag" and "newton-cg"" solvers cannot optimize L1-penalized models,
727735
therefore the "multinomial" setting does not learn sparse models.
728736

729-
The solver "sag" uses a Stochastic Average Gradient descent [4]_. It is faster
737+
The solver "sag" uses a Stochastic Average Gradient descent [6]_. It is faster
730738
than other solvers for large datasets, when both the number of samples and the
731739
number of features are large.
732740

@@ -778,9 +786,9 @@ entropy loss.
778786

779787
.. topic:: References:
780788

781-
.. [3] Christopher M. Bishop: Pattern Recognition and Machine Learning, Chapter 4.3.4
789+
.. [5] Christopher M. Bishop: Pattern Recognition and Machine Learning, Chapter 4.3.4
782790
783-
.. [4] Mark Schmidt, Nicolas Le Roux, and Francis Bach: `Minimizing Finite Sums with the Stochastic Average Gradient. <http://hal.inria.fr/hal-00860051/PDF/sag_journal.pdf>`_
791+
.. [6] Mark Schmidt, Nicolas Le Roux, and Francis Bach: `Minimizing Finite Sums with the Stochastic Average Gradient. <http://hal.inria.fr/hal-00860051/PDF/sag_journal.pdf>`_
784792
785793
Stochastic Gradient Descent - SGD
786794
=================================

0 commit comments

Comments
 (0)
0