10000 [DOC] Fix MDS images by Micky774 · Pull Request #22464 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[DOC] Fix MDS images #22464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 57 additions & 14 deletions doc/modules/linear_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -860,28 +860,71 @@ regularization.
that it improves numerical stability. No regularization amounts to
setting C to a very high value.

As an optimization problem, binary class :math:`\ell_2` penalized logistic
regression minimizes the following cost function:
Binary Case
-----------

.. math:: \min_{w, c} \frac{1}{2}w^T w + C \sum_{i=1}^n \log(\exp(- y_i (X_i^T w + c)) + 1) .
For notational ease, we assume that the target :math:`y_i` takes values in the
set :math:`\{0, 1\}` for data point :math:`i`. As an optimization problem, binary
class logistic regression with regularization term :math:`r(w)` minimizes the
following cost function:

Similarly, :math:`\ell_1` regularized logistic regression solves the following
.. math:: \min_{w, c} r(w) + C \sum_{i=1}^n \log(1 + \exp(X_i^T w + w_0)) - y_i * (X_i^T w + w_0).

Once fitter, the ``predict_proba`` method of ``LogisticRegression`` predicts the class probability of :math:`P(y_i=1|X_i) = \expit(X_i^T w + w_0) = \frac{1}{1 + \exp(-X_i^T w - w_0)}`.

Multinomial Case
----------------

The binary case can be extended to :math:`K`-classes leading to the multinomial logistic regression, see also `log-linear model
<https://en.wikipedia.org/wiki/Multinomial_logistic_regression#As_a_log-linear_model>`_.

.. note::
It is possible in a :math:`K`-class context to parameterize the model
using only :math:`K-1` weight vectors, leaving one class probability fully
determined by the other class probabilities by leveraging the fact that all
class probabilities must sum to one. We deliberately choose to overparameterize the model
using :math:`K` weight vectors for ease of implementation and to preserve the
symmetrical inductive bias regarding ordering of classes, see [1].. This effect becomes
especially important when using regularization.

Let :math:`J_i` be a binary vector with a :math:`0` for every element except
for element :math:`i`. In the multinomial context with :math:`K`-many classes,
we define the target vector of :math:`X_n` as :math:`Y_n=J_t` where :math:`t`
is the true class of :math:`X_n`. Instead of a single weight vector, we now have
a matrix of weights :math:`W` where each vector :math:`W_k` corresponds to class
:math:`k`. Then we can define the evidence vector :math:`z_n` component-wise as:

.. math:: p(Y_n=J_k|X_n) = z_{n,k} = \frac{\exp (W_k^T X_n)}{\sum_j \exp (W_j^T X_n)}

Finding the weight matrix $W$ corresponds to solving the following
optimization problem:

.. math:: \min_{w, c} \|w\|_1 + C \sum_{i=1}^n \log(\exp(- y_i (X_i^T w + c)) + 1).
.. math:: \min_W r(W) - C\sum_n Y_n^T z_n

.. note::

Elastic-Net regularization is a combination of :math:`\ell_1` and
:math:`\ell_2`, and minimizes the following cost function:
In the multinomial case, the regularization function internally flattens the
matrix of weights into a vector. This is equivalent to concatenating each
individual :math:`W_k` vector. Thus, for a matrix :math:`W`, using
:math:`\ell_2` regularization is equivalent to taking the Frobenius norm:
:math:`r(W) = \|W\|_F`

Regularization
--------------
We currently implement four choices of regularization term

.. math:: \min_{w, c} \frac{1 - \rho}{2}w^T w + \rho \|w\|_1 + C \sum_{i=1}^n \log(\exp(- y_i (X_i^T w + c)) + 1),
#. None, :math:`r(w) = 0`
#. :math:`\ell_1,\, r(w) = \|w\|_1`
#. :math:`\ell_2,\, r(w) = \|w\|_2 = w^T w`
#. ElasticNet, :math:`r(w) = \frac{1 - \rho}{2}w^T w + \rho \|w\|_1`

where :math:`\rho` controls the strength of :math:`\ell_1` regularization vs.
:math:`\ell_2` regularization (it corresponds to the `l1_ratio` parameter).
For ElasticNet, :math:`\rho` (which corresponds to the `l1_ratio` parameter)
controls the strength of :math:`\ell_1` regularization vs. :math:`\ell_2`
regularization. Elastic-Net is equivalent to :math:`\ell_1` when
:math:`\rho = 1` and equivalent to :math:`\ell_2` when :math:`\rho=0`.

Note that, in this notation, it's assumed that the target :math:`y_i` takes
values in the set :math:`{-1, 1}` at trial :math:`i`. We can also see that
Elastic-Net is equivalent to :math:`\ell_1` when :math:`\rho = 1` and equivalent
to :math:`\ell_2` when :math:`\rho=0`.
Solvers
-------

The solvers implemented in the class :class:`LogisticRegression`
are "liblinear", "newton-cg", "lbfgs", "sag" and "saga":
Expand Down
16 changes: 6 additions & 10 deletions examples/manifold/plot_lle_digits.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,11 @@
from sklearn.preprocessing import MinMaxScaler


def plot_embedding(X, title, ax):
def plot_embedding(X, title):
X = MinMaxScaler().fit_transform(X)
plt.figure()
ax = plt.subplot(111)

for digit in digits.target_names:
ax.scatter(
*X[y == digit].T,
Expand Down Expand Up @@ -175,15 +178,8 @@ def plot_embedding(X, title, ax):

# %%
# Finally, we can plot the resulting projection given by each method.
from itertools import zip_longest

fig, axs = plt.subplots(nrows=7, ncols=2, figsize=(17, 24))

for name, ax in zip_longest(timing, axs.ravel()):
if name is None:
ax.axis("off")
continue
for name in timing:
title = f"{name} (time {timing[name]:.3f}s)"
plot_embedding(projections[name], title, ax)
plot_embedding(projections[name], title)

plt.show()
0