8000 DOC: use notebook-style for plot_covariance_estimation.py by darioka · Pull Request #23150 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
8000

DOC: use notebook-style for plot_covariance_estimation.py #23150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 27, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 53 additions & 44 deletions examples/covariance/plot_covariance_estimation.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,51 +13,15 @@
:ref:`shrunk_covariance` estimators. In particular, it focuses on how to
set the amount of regularization, i.e. how to choose the bias-variance
trade-off.

Here we compare 3 approaches:

* Setting the parameter by cross-validating the likelihood on three folds
according to a grid of potential shrinkage parameters.

* A close formula proposed by Ledoit and Wolf to compute
the asymptotically optimal regularization parameter (minimizing a MSE
criterion), yielding the :class:`~sklearn.covariance.LedoitWolf`
covariance estimate.

* An improvement of the Ledoit-Wolf shrinkage, the
:class:`~sklearn.covariance.OAS`, proposed by Chen et al. Its
convergence is significantly better under the assumption that the data
are Gaussian, in particular for small samples.

To quantify estimation error, we plot the likelihood of unseen data for
different values of the shrinkage parameter. We also show the choices by
cross-validation, or with the LedoitWolf and OAS estimates.

Note that the maximum likelihood estimate corresponds to no shrinkage,
and thus performs poorly. The Ledoit-Wolf estimate performs really well,
as it is close to the optimal and is computational not costly. In this
example, the OAS estimate is a bit further away. Interestingly, both
approaches outperform cross-validation, which is significantly most
computationally costly.

"""

import numpy as np
import matplotlib.pyplot as plt
from scipy import linalg

from sklearn.covariance import (
LedoitWolf,
OAS,
ShrunkCovariance,
log_likelihood,
empirical_covariance,
)
from sklearn.model_selection import GridSearchCV
# %%
# Generate sample data
# --------------------

import numpy as np

# #############################################################################
# Generate sample data
n_features, n_samples = 40, 20
np.random.seed(42)
base_X_train = np.random.normal(size=(n_samples, n_features))
Expand All @@ -68,8 +32,13 @@
X_train = np.dot(base_X_train, coloring_matrix)
X_test = np.dot(base_X_test, coloring_matrix)

# #############################################################################

# %%
# Compute the likelihood on test data
# -----------------------------------

from sklearn.covariance import ShrunkCovariance, empirical_covariance, log_likelihood
from scipy import linalg

# spanning a range of possible shrinkage coefficient values
shrinkages = np.logspace(-2, 0, 30)
Expand All @@ -83,8 +52,29 @@
emp_cov = empirical_covariance(X_train)
loglik_real = -log_likelihood(emp_cov, linalg.inv(real_cov))

# #############################################################################
# Compare different approaches to setting the parameter

# %%
# Compare different approaches to setting the regularization parameter
# --------------------------------------------------------------------
#
# Here we compare 3 approaches:
#
# * Setting the parameter by cross-validating the likelihood on three folds
# according to a grid of potential shrinkage parameters.
#
# * A close formula proposed by Ledoit and Wolf to compute
# the asymptotically optimal regularization parameter (minimizing a MSE
# criterion), yielding the :class:`~sklearn.covariance.LedoitWolf`
# covariance estimate.
#
# * An improvement of the Ledoit-Wolf shrinkage, the
# :class:`~sklearn.covariance.OAS`, proposed by Chen et al. Its
# convergence is significantly better under the assumption that the data
# are Gaussian, in particular for small samples.


from sklearn.model_selection import GridSearchCV
from sklearn.covariance import LedoitWolf, OAS

# GridSearch for an optimal shrinkage coefficient
tuned_parameters = [{"shrinkage": shrinkages}]
Expand All @@ -99,8 +89,17 @@
oa = OAS()
loglik_oa = oa.fit(X_train).score(X_test)

# #############################################################################
# %%
# Plot results
# ------------
#
#
# To quantify esti 7901 mation error, we plot the likelihood of unseen data for
# different values of the shrinkage parameter. We also show the choices by
# cross-validation, or with the LedoitWolf and OAS estimates.

import matplotlib.pyplot as plt

fig = plt.figure()
plt.title("Regularized covariance: likelihood and shrinkage coefficient")
plt.xlabel("Regularization parameter: shrinkage coefficient")
Expand Down Expand Up @@ -145,3 +144,13 @@
plt.legend()

plt.show()

# %%
# .. note::
#
# The maximum likelihood estimate corresponds to no shrinkage,
# and thus performs poorly. The Ledoit-Wolf estimate performs really well,
# as it is close to the optimal and is not computationally costly. In this
# example, the OAS estimate is a bit further away. Interestingly, both
# approaches outperform cross-validation, which is significantly most
# computationally costly.
0