8000 MRG adding SomeScore objects for better (?!) grid search interface. by amueller · Pull Request #1381 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

MRG adding SomeScore objects for better (?!) grid search interface. #1381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 59 commits into from
Feb 3, 2013
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
801e265
ENH working on cross_val_score, trying to simplify unsupervised treat…
amueller Dec 16, 2012
50542dc
ENH better testing of old an new interface. Still a bit to do for uns…
amueller Dec 16, 2012
773c799
FIX usage of scores for unsupervised algorithms.
amueller Dec 16, 2012
1ec4a21
ENH use new api in permutation_test_score, don't use old api in testing.
amueller Dec 16, 2012
e6484c4
ENH fbeta score working, more tests
amueller Dec 16, 2012
46c2649
DOC-string for AsScorer
amueller Dec 19, 2012
053b6e7
ENH renamed ap and auc, added RecallScorrer
amueller Dec 19, 2012
e2e8d0b
DOC narrative docs for scoring functions. Put them next to GridSearch…
amueller Dec 19, 2012
ffa0b9d
ENH update example, minor fix.
amueller Dec 19, 2012
8111ec0
DOC improve cross validation and grid search docstring
amueller Dec 19, 2012
53f8973
FIX rename error
amueller Dec 19, 2012
7aaee87
DOC add whatsnew entry
amueller Dec 19, 2012
b256d13
DOC fixed formatting in user guide
amueller Dec 19, 2012
98b5d10
FIX example
amueller Dec 19, 2012
aee8e8d
DOC added a new template to sphinx so view the "__call__" function.
amueller Dec 19, 2012
2734bef
COSMIT address @ogrisel's comment.
amueller Jan 8, 2013
c04da31
FIX rename ZeroOneScorer to AccuracyScorer
amueller Jan 12, 2013
a728ee2
DOCFIX for zero_one_score / accuracy_score renaming
amueller Jan 12, 2013
fb8285a
DOC add narrative about score func objects to the model_evaluation docs.
amueller Jan 30, 2013
8511f48
ENH rename scorer objects to lowercase as they are instances, not cla…
amueller Jan 30, 2013
b118b93
DOC minor fixes in pairwise docs.
amueller Jan 30, 2013
4fc3f43
ENH/DOC add "score_objects" function for documenting the score object…
amueller Jan 30, 2013
394f87b
DOC add metrics.score_objects to the references
amueller Jan 30, 2013
e1d9376
DOC use table from score_functions docstring in model_evaulation narr…
amueller Jan 30, 2013
a92adad
DOC move scoring function narrative above dummy estimators, fix table…
amueller Jan 31, 2013
0d7e1a6
DOC minor fixes in score_objects documentation.
amueller Jan 31, 2013
e9556eb
DOC better table of score functions in grid-search docs.
amueller Feb 2, 2013
2d9cb81
ENH GridSearchCV and cross_val_score check whether the returned score…
amueller Feb 2, 2013
43e1c39
TST improve coverage of permutation test scores
amueller Feb 2, 2013
f650ad4
TST slightly better test coverage in cross_val_score
amueller Feb 2, 2013
374f81a
COSMIT built-in typo
amueller Feb 2, 2013
385e581
DOC some improvements as suggested by @ogrisel
amueller Feb 2, 2013
99f6a56
TST add test for pickling custom scorer objects
amueller Feb 2, 2013
4847227
DOC more improvements by @ogrisel
amueller Feb 2, 2013
b5cc3ba
COSMIT rename AsScorer to Scorer
amueller Feb 2, 2013
31254c2
COSMIT :: in rst is easier for syntax highlighters
GaelVaroquaux Feb 2, 2013
03aa748
DOC: minor formatting in model_evaluation.rst
GaelVaroquaux Feb 2, 2013
8db5750
DOC: minor rst issues
GaelVaroquaux Feb 2, 2013
9d8e6e9
DOC: misc rst formatting
GaelVaroquaux Feb 2, 2013
ee96e7d
MISC moved score_objects.py to scorer.py, added module level doc stri…
amueller Feb 3, 2013
6f79e47
DOC add kwargs in Scorer to docstring.
8000 amueller Feb 3, 2013
51c3678
ENH add ``__repr__`` to Scorer
amueller Feb 3, 2013
318278e
DOC addressed @ogrisel's comments.
amueller Feb 3, 2013
0730e73
COSMIT text reflow
amueller Feb 3, 2013
e375677
MISC pep8: rename scorers to SCORERS, remove score_objects getter
amueller Feb 3, 2013
dbfd837
DOC remove duplicate table, add references to appropriate user guide …
amueller Feb 3, 2013
266fd0c
DOC add note on deprecation of score_func to whatsnew
amueller Feb 3, 2013
3d3a305
FIX imports for Scorer and SCORERS
amueller Feb 3, 2013
31fb403
DOC fixes in whatsnew, typo
amueller Feb 3, 2013
8ca3647
TST smoke test repr
amueller Feb 3, 2013
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions doc/modules/classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -662,6 +662,14 @@ user guide for further details.

.. currentmodule:: sklearn

Model Selection Interface
-------------------------
.. autosummary::
:toctree: generated/
:template: class_with_call.rst

metrics.Scorer

Classification metrics
----------------------

Expand Down
5 changes: 3 additions & 2 deletions doc/modules/cross_validation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,14 +83,15 @@ by::

By default, the score computed at each CV iteration is the ``score``
method of the estimator. It is possible to change this by passing a custom
scoring function, e.g. from the metrics module::
scoring function::

>>> from sklearn import metrics
>>> cross_validation.cross_val_score(clf, iris.data, iris.target, cv=5,
... score_func=metrics.f1_score)
... scoring='f1')
... # doctest: +ELLIPSIS
array([ 1. ..., 0.96..., 0.89..., 0.96..., 1. ])

See :ref:`score_func_objects` for details.
In the case of the Iris dataset, the samples are balanced across target
classes hence the accuracy and the F1-score are almost equal.

Expand Down
14 changes: 14 additions & 0 deletions doc/modules/grid_search.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,20 @@ combinations is retained.
This can be done by using the :func:`cross_validation.train_test_split`
utility function.

.. currentmodule:: sklearn.grid_search

.. _gridsearch_scoring:

Scoring functions for GridSearchCV
----------------------------------
By default, :class:`GridSearchCV` uses the ``score`` function of the estimator
to evaluate a parameter setting. These are the :func:`sklearn.metrics.accuracy_score` for classification
and :func:`sklearn.metrics.r2_score` for regression.
For some applications, other scoring function are better suited (for example in
unbalanced classification, the accuracy score is often non-informative). An
alternative scoring function can be specified via the ``scoring`` parameter to
:class:`GridSearchCV`.
See :ref:`score_func_objects` for more details.

Examples
========
Expand Down
134 changes: 120 additions & 14 deletions doc/modules/model_evaluation.rst
E40C
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,7 @@ In this context, we can define the notions of precision, recall and F-measure:

F_\beta = (1 + \beta^2) \frac{\text{precision} \times \text{recall}}{\beta^2 \text{precision} + \text{recall}}.

Here some small examples in binary classification:
Here some small examples in binary classification::

>>> from sklearn import metrics
>>> y_pred = [0, 1, 0, 0]
Expand Down Expand Up @@ -411,7 +411,7 @@ their support

\texttt{weighted\_{}F\_{}beta}(y,\hat{y}) &= \frac{1}{n_\text{samples}} \sum_{i=0}^{n_\text{samples} - 1} (1 + \beta^2)\frac{|y_i \cap \hat{y}_i|}{\beta^2 |\hat{y}_i| + |y_i|}.

Here an example where ``average`` is set to ``average`` to ``macro``:
Here an example where ``average`` is set to ``average`` to ``macro``::

>>> from sklearn import metrics
>>> y_true = [0, 1, 2, 0, 1, 2]
Expand All @@ -427,7 +427,7 @@ Here an example where ``average`` is set to ``average`` to ``macro``:
>>> metrics.precision_recall_fscore_support(y_true, y_pred, average='macro') # doctest: +ELLIPSIS
(0.22..., 0.33..., 0.26..., None)

Here an example where ``average`` is set to to ``micro``:
Here an example where ``average`` is set to to ``micro``::

>>> from sklearn import metrics
>>> y_true = [0, 1, 2, 0, 1, 2]
Expand All @@ -443,7 +443,7 @@ Here an example where ``average`` is set to to ``micro``:
>>> metrics.precision_recall_fscore_support(y_true, y_pred, average='micro') # doctest: +ELLIPSIS
(0.33..., 0.33..., 0.33..., None)

27BC Here an example where ``average`` is set to to ``weighted``:
Here an example where ``average`` is set to to ``weighted``::

>>> from sklearn import metrics
>>> y_true = [0, 1, 2, 0, 1, 2]
Expand All @@ -459,7 +459,7 @@ Here an example where ``average`` is set to to ``weighted``:
>>> metrics.precision_recall_fscore_support(y_true, y_pred, average='weighted') # doctest: +ELLIPSIS
(0.22..., 0.33..., 0.26..., None)

Here an example where ``average`` is set to ``None``:
Here an example where ``average`` is set to ``None``::

>>> from sklearn import metrics
>>> y_true = [0, 1, 2, 0, 1, 2]
Expand Down Expand Up @@ -492,7 +492,7 @@ value and :math:`w` is the predicted decisions as output by
L_\text{Hinge}(y, w) = \max\left\{1 - wy, 0\right\} = \left|1 - wy\right|_+

Here a small example demonstrating the use of the :func:`hinge_loss` function
with a svm classifier:
with a svm classifier::

>>> from sklearn import svm
>>> from sklearn.metrics import hinge_loss
Expand Down Expand Up @@ -653,7 +653,8 @@ variance is estimated as follow:

The best possible score is 1.0, lower values are worse.

Here a small example of usage of the :func:`explained_variance_scoreé` function:
Here a small example of usage of the :func:`explained_variance_score`
function::

>>> from sklearn.metrics import explained_variance_score
>>> y_true = [3, -0.5, 2, 7]
Expand All @@ -676,7 +677,7 @@ and :math:`y_i` is the corresponding true value, then the mean absolute error

\text{MAE}(y, \hat{y}) = \frac{1}{n_{\text{samples}}} \sum_{i=0}^{n_{\text{samples}}-1} \left| y_i - \hat{y}_i \right|.

Here a small example of usage of the :func:`mean_absolute_error` function:
Here a small example of usage of the :func:`mean_absolute_error` function::

>>> from sklearn.metrics import mean_absolute_error
>>> y_true = [3, -0.5, 2, 7]
Expand Down Expand Up @@ -705,7 +706,8 @@ and :math:`y_i` is the corresponding true value, then the mean squared error

\text{MSE}(y, \hat{y}) = \frac{1}{n_\text{samples}} \sum_{i=0}^{n_\text{samples} - 1} (y_i - \hat{y}_i)^2.

Here a small example of usage of the :func:`mean_squared_error` function:
Here a small example of usage of the :func:`mean_squared_error`
function::

>>> from sklearn.metrics import mean_squared_error
>>> y_true = [3, -0.5, 2, 7]
Expand Down Expand Up @@ -740,7 +742,7 @@ over :math:`n_{\text{samples}}` is defined as

where :math:`\bar{y} = \frac{1}{n_{\text{samples}}} \sum_{i=0}^{n_{\text{samples}} - 1} y_i`.

Here a small example of usage of the :func:`r2_score` function:
Here a small example of usage of the :func:`r2_score` function::

>>> from sklearn.metrics import r2_score
>>> y_true = [3, -0.5, 2, 7]
Expand All @@ -765,17 +767,121 @@ Clustering metrics
The :mod:`sklearn.metrics` implements several losses, scores and utility
function for more information see the :ref:`clustering_evaluation` section.


.. _score_func_objects:

.. currentmodule:: sklearn

`Scoring` objects: defining your scoring rules
===============================================
While the above functions provide a simple interface for most use-cases, they
can not directly be used for model selection and evaluation using
:class:`grid_search.GridSearchCV` and
:func:`cross_validation.cross_val_score`, as scoring functions have different
signatures and might require additional parameters.

Instead, :class:`grid_search.GridSearchCV` and
:func:`cross_validation.cross_val_score` both take callables that implement
estimator dependent functions. That allows for very flexible evaluation of
models, for example taking complexity of the model into account.

For scoring functions that take no additional parameters (which are most of
them), you can simply provide a string as the ``scoring`` parameter. Possible
values are:


=================== ===============================================
Scoring Function
=================== ===============================================
**Classification**
'accuracy' :func:`sklearn.metrics.accuracy_score`
'average_precision' :func:`sklearn.metrics.average_precision_score`
'f1' :func:`sklearn.metrics.f1_score`
'precision' :func:`sklearn.metrics.precision_score`
'recall' :func:`sklearn.metrics.recall_score`
'roc_auc' :func:`sklearn.metrics.auc_score`

**Clustering**
'ari'` :func:`sklearn.metrics.adjusted_rand_score`

**Regression**
'mse' :func:`sklearn.metrics.mean_squared_error`
'r2' :func:`sklearn.metrics.r2_score`
=================== ===============================================

The corresponding scorer objects are stored in the dictionary
``sklearn.metrics.SCORERS``.

.. currentmodule:: sklearn.metrics

Creating scoring objects from score functions
---------------------------------------------
If you want to use a scoring function that takes additional parameters, such as
:func:`fbeta_score`, you need to generate an appropriate scoring object. The
simplest way to generate a callable object for scoring is by using
:class:`Scorer`.
:class:`Scorer` converts score functions as above into callables that can be
used for model evaluation.

One typical use case is to wrap an existing scoring function from the library
with non default value for its parameters such as the beta parameter for the
:func:`fbeta_score` function::

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the trailing :: from the previous sentence and insert the following:

"One typical use case is to wrap an existing scoring function from the library with non default value for its parameters such as the beta parameter for the :func:fbeta_score function::"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that is a very good remark.

>>> from sklearn.metrics import fbeta_score, Scorer
>>> ftwo_scorer = Scorer(fbeta_score, beta=2)
>>> from sklearn.grid_search import GridSearchCV
>>> from sklearn.svm import LinearSVC
>>> grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]}, scoring=ftwo_scorer)

The second use case is to help build a completely new and custom scorer object
from a simple python function::

>>> def my_custom_loss_func(ground_truth, predictions):
... diff = np.abs(ground_truth - predictions).max()
... return np.log(1 + diff)
...
>>> my_custom_scorer = Scorer(my_custom_loss_func, greater_is_better=False)
>>> grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]}, scoring=my_custom_scorer)

:class:`Scorer` takes as parameters the function you want to use, whether it is
a score (``greater_is_better=True``) or a loss (``greater_is_better=False``),
whether the function you provided takes predictions as input
(``needs_threshold=False``) or needs confidence scores
(``needs_threshold=True``) and any additional parameters, such as ``beta`` in
the example above.


Implementing your own scoring object
------------------------------------
You can generate even more flexible model scores by constructing your own
scoring object from scratch, without using the :class:`Scorer` helper class.
The requirements that a callable can be used for model selection are as
follows:

- It can be called with parameters ``(estimator, X, y)``, where ``estimator``
it the model that should be evaluated, ``X`` is validation data and ``y`` is
the ground truth target for ``X`` (in the supervised case) or ``None`` in the
unsupervised case.

- The call returns a number indicating the quality of estimator.

- The callable has a boolean attribute ``greater_is_better`` which indicates whether
high or low values correspond to a better estimator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add a sentence such as:

"""
Objects that meet those conditions as said to implement the sklearn Scorer protocol.
"""

Having such an officially named, documented API will make it easier to have third party model selection and assessment software tools (for instance for someone would like to implement an randomized grid search that uses infrastructure such as http://picloud.com in a sister project with sklearn interoperability).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Objects that meet those conditions as said to implement the sklearn Scorer
protocol.


.. _dummy_estimators:

Dummy estimators
=================

.. currentmodule:: sklearn.dummy

When doing supervised learning, a simple sanity check consists in comparing one's
estimator against simple rules of thumb.
:class:`DummyClassifier` implements three such simple strategies for
classification:
When doing supervised learning, a simple sanity check consists in comparing
one's estimator against simple rules of thumb. :class:`DummyClassifier`
implements three such simple strategies for classification:

- `stratified` generates randomly predictions by respecting the training
set's class distribution,
Expand Down
13 changes: 13 additions & 0 deletions doc/templates/class_with_call.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{{ fullname }}
{{ underline }}

.. currentmodule:: {{ module }}

.. autoclass:: {{ objname }}

{% block methods %}
.. automethod:: __init__
.. automethod:: __call__
{% endblock %}


7 changes: 7 additions & 0 deletions doc/whats_new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,13 @@ Changelog
- Hyperlinks to documentation in example code on the website by
`Martin Luessi`_.

- :class:`grid_search.GridSearchCV` and
:func:`cross_validation.cross_val_score` now support the use of advanced
scoring function such as area under the ROC curve and f-beta scores.
See :ref:`score_func_objects` for details. By `Andreas Müller`_.
Passing a function from :mod:`sklearn.metrics` as ``score_func`` is
deprecated.


.. _changes_0_13:

Expand Down
13 changes: 4 additions & 9 deletions examples/grid_search_digits.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,6 @@
from sklearn.cross_validation import train_test_split
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.svm import SVC

print(__doc__)
Expand All @@ -46,16 +44,13 @@
'C': [1, 10, 100, 1000]},
{'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]

scores = [
('precision', precision_score),
('recall', recall_score),
]
scores = ['precision', 'recall']

for score_name, score_func in scores:
print("# Tuning hyper-parameters for %s" % score_name)
for score in scores:
print("# Tuning hyper-parameters for %s" % score)
print()

clf = GridSearchCV(SVC(C=1), tuned_parameters, score_func=score_func)
clf = GridSearchCV(SVC(C=1), tuned_parameters, scoring=score)
clf.fit(X_train, y_train, cv=5)

print("Best parameters set found on development set:")
Expand Down
3 changes: 1 addition & 2 deletions examples/plot_permutation_test_for_classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@
from sklearn.svm import SVC
from sklearn.cross_validation import StratifiedKFold, permutation_test_score
from sklearn import datasets
from sklearn.metrics import accuracy_score


##############################################################################
Expand All @@ -43,7 +42,7 @@
cv = StratifiedKFold(y, 2)

score, permutation_scores, pvalue = permutation_test_score(
svm, X, y, accuracy_score, cv=cv, n_permutations=100, n_jobs=1)
svm, X, y, scoring="accuracy", cv=cv, n_permutations=100, n_jobs=1)

print("Classification score %s (pvalue : %s)" % (score, pvalue))

Expand Down
Loading
0