8000 [MRG+1] DOC A less-nested coverage of model evaluation by jnothman · Pull Request #3527 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+1] DOC A less-nested coverage of model evaluation #3527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 5, 2014
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 54 additions & 67 deletions doc/modules/model_evaluation.rst
8000
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,10 @@ model:
This is discussed on section :ref:`scoring_parameter`.

* **Metric functions**: The :mod:`metrics` module implements functions
assessing prediction errors for specific purposes. This is discussed in
the section :ref:`prediction_error_metrics`.
assessing prediction error for specific purposes. These metrics are detailed
in sections on :ref:`classification_metrics`,
:ref:`multilabel_ranking_metrics`, :ref:`regression_metrics` and
:ref:`clustering_metrics`.

Finally, :ref:`dummy_estimators` are useful to get a baseline
value of those metrics for random predictions.
Expand All @@ -42,7 +44,7 @@ Model selection and evaluation using tools, such as
controls what metric they apply to estimators evaluated.

Common cases: predefined values
--------------------------------
-------------------------------

For the most common usecases, you can simply provide a string as the
``scoring`` parameter. Possible values are:
Expand Down Expand Up @@ -91,22 +93,31 @@ predicted values. These are detailed below, in the next sections.

.. _scoring:

Defining your scoring strategy from score functions
Defining your scoring strategy from metric functions
-----------------------------------------------------

The scoring parameter can be a callable that takes model predictions and
ground truth.
The module :mod:`sklearn.metric` also exposes a set of simple functions
measuring a prediction error given ground truth and prediction:

However, if you want to use a scoring function that takes additional parameters, such as
:func:`fbeta_score`, you need to generate an appropriate scoring object. The
simplest way to generate a callable object for scoring is by using
:func:`make_scorer`.
That function converts score functions (discussed below in :ref:`prediction_error_metrics`) into callables that can be
used for model evaluation.
- functions ending with ``_score`` return a value to
maximize (the higher the better).

One typical use case is to wrap an existing scoring function from the library
with non default value for its parameters such as the ``beta`` parameter for the
:func:`fbeta_score` function::
- functions ending with ``_error`` or ``_loss`` return a
value to minimize (the lower the better).

Metrics available for various machine learning tasks are detailed in sections
below.

Many metrics are not given names to be used as ``scoring`` values,
sometimes because they require additional parameters, such as
:func:`fbeta_score`. In such cases, you need to generate an appropriate
scoring object. The simplest way to generate a callable object for scoring
is by using :func:`make_scorer`. That function converts metrics
into callables that can be used for model evaluation.

One typical use case is to wrap an existing metric function from the library
with non default value for its parameters, such as the ``beta`` parameter for
the :func:`fbeta_score` function::

>>> from sklearn.metrics import fbeta_score, make_scorer
>>> ftwo_scorer = make_scorer(fbeta_score, beta=2)
Expand Down Expand Up @@ -138,6 +149,8 @@ from a simple python function::
* any additional parameters, such as ``beta`` in an :func:`f1_score`.


.. _diy_scoring:

Implementing your own scoring object
------------------------------------
You can generate even more flexible model scores by constructing your own
Expand All @@ -154,24 +167,10 @@ the following two rules:
``estimator``'s predictions on ``X`` which reference to ``y``.
Again, higher numbers are better.

.. _prediction_error_metrics:

Function for prediction-error metrics
======================================

The module :mod:`sklearn.metric` also exposes a set of simple functions
measuring a prediction error given ground truth and prediction:

- functions ending with ``_score`` return a value to
maximize (the higher the better).

- functions ending with ``_error`` or ``_loss`` return a
value to minimize (the lower the better).

.. _classification_metrics:

Classification metrics
-----------------------
=======================

.. currentmodule:: sklearn.metrics

Expand Down Expand Up @@ -228,7 +227,7 @@ And some work with binary and multilabel indicator format:
In the following sub-sections, we will describe each of those functions.

Accuracy score
..............
--------------

The :func:`accuracy_score` function computes the
`accuracy <http://en.wikipedia.org/wiki/Accuracy_and_precision>`_, the fraction
Expand Down Expand Up @@ -271,7 +270,7 @@ In the multilabel case with binary label indicators: ::
the dataset.

Confusion matrix
................
----------------

The :func:`confusion_matrix` function computes the `confusion matrix
<http://en.wikipedia.org/wiki/Confusion_matrix>`_ to evaluate
Expand Down Expand Up @@ -313,7 +312,7 @@ from the :ref:`example_model_selection_plot_confusion_matrix.py` example):


Classification report
......................
----------------------

The :func:`classification_report` function builds a text report showing the
main classification metrics. Here a small example with custom ``target_names``
Expand Down Expand Up @@ -348,7 +347,7 @@ and inferred labels::
grid search with a nested cross-validation.

Hamming loss
.............
-------------

The :func:`hamming_loss` computes the average Hamming loss or `Hamming
distance <http://en.wikipedia.org/wiki/Hamming_distance>`_ between two sets
Expand Down Expand Up @@ -395,7 +394,7 @@ In the multilabel case with binary label indicators: ::


Jaccard similarity coefficient score
.....................................
-------------------------------------

The :func:`jaccard_similarity_score` function computes the average (default)
or sum of `Jaccard similarity coefficients
Expand Down Expand Up @@ -432,7 +431,7 @@ In the multilabel case with binary label indicators: ::
.. _precision_recall_f_measure_metrics:

Precision, recall and F-measures
.................................
---------------------------------

The `precision <http://en.wikipedia.org/wiki/Precision_and_recall#Precision>`_
is intuitively the ability of the classifier not to label as
Expand Down Expand Up @@ -639,7 +638,7 @@ Then the metrics are defined as:


Hinge loss
...........
-----------

The :func:`hinge_loss` function computes the average
`hinge loss function <http://en.wikipedia.org/wiki/Hinge_loss>`_. The hinge
Expand Down Expand Up @@ -673,7 +672,8 @@ with a svm classifier::


Log loss
........
--------

The log loss, also called logistic regression loss or cross-entropy loss,
is a loss function defined on probability estimates.
It is commonly used in (multinomial) logistic regression and neural networks,
Expand Down Expand Up @@ -725,7 +725,7 @@ The log loss is non-negative.


Matthews correlation coefficient
.................................
---------------------------------

The :func:`matthews_corrcoef` function computes the Matthew's correlation
coefficient (MCC) for binary classes (quoting the `Wikipedia article on the
Expand Down Expand Up @@ -761,7 +761,7 @@ function:
.. _roc_metrics:

Receiver operating characteristic (ROC)
.......................................
---------------------------------------

The function :func:`roc_curve` computes the `receiver operating characteristic
curve, or ROC curve (quoting
Expand Down Expand Up @@ -857,7 +857,7 @@ if predicted outputs have been binarized.
.. _zero_one_loss:

Zero one loss
..............
--------------

The :func:`zero_one_loss` function computes the sum or the average of the 0-1
classification loss (:math:`L_{0-1}`) over :math:`n_{\text{samples}}`. By
Expand Down Expand Up @@ -903,7 +903,7 @@ In the multilabel case with binary label indicators: ::
.. _multilabel_ranking_metrics:

Multilabel ranking metrics
--------------------------
==========================

.. currentmodule:: sklearn.metrics

Expand All @@ -912,7 +912,8 @@ associated with it. The goal is to give high scores and better rank to
the ground truth labels.

Label ranking average precision
...............................
-------------------------------

The :func:`label_ranking_average_precision_score` function
implements the label ranking average precision (LRAP). This metric is linked to
the :func:`average_precision_score` function, but is based on the notion of
Expand Down Expand Up @@ -955,7 +956,7 @@ Here a small example of usage of this function::
.. _regression_metrics:

Regression metrics
-------------------
===================

.. currentmodule:: sklearn.metrics

Expand All @@ -966,7 +967,7 @@ to handle the multioutput case: :func:`mean_absolute_error`,


Explained variance score
.........................
-------------------------

The :func:`explained_variance_score` computes the `explained variance
regression score <http://en.wikipedia.org/wiki/Explained_variation>`_.
Expand All @@ -991,7 +992,7 @@ function::
0.957...

Mean absolute error
...................
-------------------

The :func:`mean_absolute_error` function computes the `mean absolute
error <http://en.wikipedia.org/wiki/Mean_absolute_error>`_, which is a risk
Expand Down Expand Up @@ -1021,7 +1022,7 @@ Here a small example of usage of the :func:`mean_absolute_error` function::


Mean squared error
...................
-------------------

The :func:`mean_squared_error` function computes the `mean square
error <http://en.wikipedia.org/wiki/Mean_squared_error>`_, which is a risk
Expand Down Expand Up @@ -1056,7 +1057,7 @@ function::
evaluate gradient boosting regression.

R² score, the coefficient of determination
...........................................
-------------------------------------------

The :func:`r2_score` function computes R², the `coefficient of
determination <http://en.wikipedia.org/wiki/Coefficient_of_determination>`_.
Expand Down Expand Up @@ -1092,31 +1093,17 @@ Here a small example of usage of the :func:`r2_score` function::
for an example of R² score usage to
evaluate Lasso and Elastic Net on sparse signals.

.. _clustering_metrics:

Clustering metrics
======================

The :mod:`sklearn.metrics` implements several losses, scores and utility
function for more information see the :ref:`clustering_evaluation`
section.


Biclustering metrics
====================

The :mod:`sklearn.metrics` module implements bicluster scoring
metrics. For more information see the :ref:`biclustering_evaluation`
section.


.. currentmodule:: sklearn.metrics

.. _clustering_metrics:

Clustering metrics
-------------------

The :mod:`sklearn.metrics` implements several losses, scores and utility
functions. For more information see the :ref:`clustering_evaluation` section.
functions. For more information see the :ref:`clustering_evaluation`
section for instance clustering, and :ref:`biclustering_evaluation` for
biclustering.


.. _dummy_estimators:
Expand Down
0